From noreply@sourceforge.net Fri Nov 1 07:58:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 31 Oct 2002 23:58:03 -0800 Subject: [Patches] [ python-Patches-590377 ] db4 include not found Message-ID: Patches item #590377, was opened at 2002-08-02 21:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590377&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: Invalid Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: db4 include not found Initial Comment: setup.py looks for the db4 library in /usr/lib, but doesn't look for the header in /usr/include (as you find it on Debian unstable) ---------------------------------------------------------------------- >Comment By: Matthias Klose (doko) Date: 2002-11-01 07:58 Message: Logged In: YES user_id=60903 so why not use /etc/debian_version? the dpkg thing is useful to determine the correct db version, or you grep the include file for the version number. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-31 22:24 Message: Logged In: YES user_id=44345 I was thinking more along the lines of a file we could look for. For example, on my Mandrake systems I have /etc/mandrake-release. I suspect that pretty much rules out other Linux dialects (though what if someone converts to another version but doesn't format their disks?). Testing the presence of dpkg isn't sufficient either. Other systems can use it. In fact, I have dpkg on my MacOSX system as a side-effect of being a fink user (http://fink.sf.net/). ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-31 22:13 Message: Logged In: YES user_id=60903 two possibilities come to mind: dpkg -S /usr/include/db.h prints: libdb4.0-dev: /usr/include/db.h (or libdb3-dev, if the libdb3-dev package is installed). Note that the installation of a libdb-dev package is not required, so this command may return with an error. I'm pretty sure, that dpkg only exists on Debian systems ;-) or else test for the existance of the file /etc/debian_version first. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-27 15:46 Message: Logged In: YES user_id=44345 It's clear we can't blindly add '/usr/include' to db_try_this['db4']['incdirs'] and 'db' to db_try_this['db4']['libs']. Is there some way to unambiguously detect from setup.py that Python is being built on a Debian system and that we are not dealing with an installation of db1 (which I still refuse to enable without manual intervention)? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-27 15:12 Message: Logged In: YES user_id=21627 /usr/include/db.h is not a symlink; it is the file itself. You cannot have multiple bsddb development packages (libdbX-dev) installed, because they conflict with each other. /usr/lib/libdb.so exists and is a symlink to the installed shared library. The file in question isn't actually db.h (for current bsddbmodule.c), but db_185.h, of course. As for base+patch: Sure, Debian already uses such a patch. Matthias is the Debian maintainer of Python, and asks us (as upstream) to include his patch. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-27 14:39 Message: Logged In: YES user_id=44345 Is /usr/include/db.h a symlink to some other file on Debian which is version-specific? If so, I'd be happy to add that directory to the list searched. How does Debian structure its directories to allow multiple versions of Berkeley db to be installed simultaneously? If /usr/include/db.h is the location of the include file, is /usr/lib/libdb.{so,a} the location of the library? The original (db1) versions of Berkeley DB generally had db.h in /usr/include. This version is, unfortunately, both broken and still in use. Other vendors allow multiple versions of the library to be installed, and use a version-specific directory naming scheme to keep things sorted out. Debian could do the same. No matter how strongly you believe /usr/include should be searched, I'm not going to add it by default and risk the chance that other peoples' installs break (silently) as a result. Please read the comments related to db1 in setup.py. (Search for bsddb.) Final thought here... Doesn't Debian have and base+patch sort of system? To install on Debian, all that would need to be done is develop a Debian-specific patch which adds /usr/include to the incdirs key and something like 'db' to the libs key in the db4 section. Skip ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-21 22:11 Message: Logged In: YES user_id=21627 As a bug report, I would close this as "Won't fix", pointing you to the option of not using setup.py, but compiling the module through Modules/Setup. I'd prefer to get a patch, but that should be one that works on all systems. This one breaks on systems where /usr/include/db.h is not a bsddb 4 header file. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-21 21:54 Message: Logged In: YES user_id=60903 > - Doesn't it cause -I/usr/include be added to the > compilation statement? yes, that's not correct > - Why is that done for db4 and not db3? On Debian only one libdb-dev package can be installed at build time. I assure with the build dependencies that I get the correct one. > - How can we know what bsddb version /usr/include/db.h > belongs to? Guessing that it is db4 is error-prone, so it > might be, in fact, better not to find that file. maybe grep for DB_VERSION_MAJOR in the file? I admit, this issue should be a bug report, not a patch ... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-21 21:42 Message: Logged In: YES user_id=21627 Ok, I admit that the patch might be needed; I think it is quite incorrect, though: - Doesn't it cause -I/usr/include be added to the compilation statement? - Why is that done for db4 and not db3? - How can we know what bsddb version /usr/include/db.h belongs to? Guessing that it is db4 is error-prone, so it might be, in fact, better not to find that file. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-21 21:14 Message: Logged In: YES user_id=60903 Martin, you're closing reports very quickly ;-) What makes you think the problem is solved? The code doesn't search /usr/include for db.h. Updated patch included. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-13 13:06 Message: Logged In: YES user_id=21627 The patch is out-of-date, and appears to be unnecessary. I'm rejecting it - if you still think something needs to be done, please submit a new patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590377&group_id=5470 From noreply@sourceforge.net Fri Nov 1 09:41:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 01:41:48 -0800 Subject: [Patches] [ python-Patches-590377 ] db4 include not found Message-ID: Patches item #590377, was opened at 2002-08-02 23:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590377&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: Invalid Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: db4 include not found Initial Comment: setup.py looks for the db4 library in /usr/lib, but doesn't look for the header in /usr/include (as you find it on Debian unstable) ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-01 10:41 Message: Logged In: YES user_id=21627 Would you like to update your patch to implement that strategy? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-11-01 08:58 Message: Logged In: YES user_id=60903 so why not use /etc/debian_version? the dpkg thing is useful to determine the correct db version, or you grep the include file for the version number. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-31 23:24 Message: Logged In: YES user_id=44345 I was thinking more along the lines of a file we could look for. For example, on my Mandrake systems I have /etc/mandrake-release. I suspect that pretty much rules out other Linux dialects (though what if someone converts to another version but doesn't format their disks?). Testing the presence of dpkg isn't sufficient either. Other systems can use it. In fact, I have dpkg on my MacOSX system as a side-effect of being a fink user (http://fink.sf.net/). ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-31 23:13 Message: Logged In: YES user_id=60903 two possibilities come to mind: dpkg -S /usr/include/db.h prints: libdb4.0-dev: /usr/include/db.h (or libdb3-dev, if the libdb3-dev package is installed). Note that the installation of a libdb-dev package is not required, so this command may return with an error. I'm pretty sure, that dpkg only exists on Debian systems ;-) or else test for the existance of the file /etc/debian_version first. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-27 16:46 Message: Logged In: YES user_id=44345 It's clear we can't blindly add '/usr/include' to db_try_this['db4']['incdirs'] and 'db' to db_try_this['db4']['libs']. Is there some way to unambiguously detect from setup.py that Python is being built on a Debian system and that we are not dealing with an installation of db1 (which I still refuse to enable without manual intervention)? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-27 16:12 Message: Logged In: YES user_id=21627 /usr/include/db.h is not a symlink; it is the file itself. You cannot have multiple bsddb development packages (libdbX-dev) installed, because they conflict with each other. /usr/lib/libdb.so exists and is a symlink to the installed shared library. The file in question isn't actually db.h (for current bsddbmodule.c), but db_185.h, of course. As for base+patch: Sure, Debian already uses such a patch. Matthias is the Debian maintainer of Python, and asks us (as upstream) to include his patch. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-27 15:39 Message: Logged In: YES user_id=44345 Is /usr/include/db.h a symlink to some other file on Debian which is version-specific? If so, I'd be happy to add that directory to the list searched. How does Debian structure its directories to allow multiple versions of Berkeley db to be installed simultaneously? If /usr/include/db.h is the location of the include file, is /usr/lib/libdb.{so,a} the location of the library? The original (db1) versions of Berkeley DB generally had db.h in /usr/include. This version is, unfortunately, both broken and still in use. Other vendors allow multiple versions of the library to be installed, and use a version-specific directory naming scheme to keep things sorted out. Debian could do the same. No matter how strongly you believe /usr/include should be searched, I'm not going to add it by default and risk the chance that other peoples' installs break (silently) as a result. Please read the comments related to db1 in setup.py. (Search for bsddb.) Final thought here... Doesn't Debian have and base+patch sort of system? To install on Debian, all that would need to be done is develop a Debian-specific patch which adds /usr/include to the incdirs key and something like 'db' to the libs key in the db4 section. Skip ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-22 00:11 Message: Logged In: YES user_id=21627 As a bug report, I would close this as "Won't fix", pointing you to the option of not using setup.py, but compiling the module through Modules/Setup. I'd prefer to get a patch, but that should be one that works on all systems. This one breaks on systems where /usr/include/db.h is not a bsddb 4 header file. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-21 23:54 Message: Logged In: YES user_id=60903 > - Doesn't it cause -I/usr/include be added to the > compilation statement? yes, that's not correct > - Why is that done for db4 and not db3? On Debian only one libdb-dev package can be installed at build time. I assure with the build dependencies that I get the correct one. > - How can we know what bsddb version /usr/include/db.h > belongs to? Guessing that it is db4 is error-prone, so it > might be, in fact, better not to find that file. maybe grep for DB_VERSION_MAJOR in the file? I admit, this issue should be a bug report, not a patch ... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-21 23:42 Message: Logged In: YES user_id=21627 Ok, I admit that the patch might be needed; I think it is quite incorrect, though: - Doesn't it cause -I/usr/include be added to the compilation statement? - Why is that done for db4 and not db3? - How can we know what bsddb version /usr/include/db.h belongs to? Guessing that it is db4 is error-prone, so it might be, in fact, better not to find that file. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-21 23:14 Message: Logged In: YES user_id=60903 Martin, you're closing reports very quickly ;-) What makes you think the problem is solved? The code doesn't search /usr/include for db.h. Updated patch included. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-13 15:06 Message: Logged In: YES user_id=21627 The patch is out-of-date, and appears to be unnecessary. I'm rejecting it - if you still think something needs to be done, please submit a new patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590377&group_id=5470 From noreply@sourceforge.net Fri Nov 1 09:55:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 01:55:32 -0800 Subject: [Patches] [ python-Patches-631972 ] Platform test for jython Message-ID: Patches item #631972, was opened at 2002-11-01 10:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 Category: Tests Group: None Status: Open Resolution: None Priority: 5 Submitted By: Finn Bock (bckfnn) Assigned to: Nobody/Anonymous (nobody) Summary: Platform test for jython Initial Comment: Some tests (and more to come) contains a test for the java platform. I suggest adding a 'is_jython' flag to test_support to make it easier to read tests with jython specific logic. Is this change possible for inclusion in a 2.2.x release? Since jython is running against a 2.2 base, backporting fixes to the test suite would be very convenient for us. I can commit it myself if it is accepted. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 From noreply@sourceforge.net Fri Nov 1 14:02:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 06:02:16 -0800 Subject: [Patches] [ python-Patches-590377 ] db4 include not found Message-ID: Patches item #590377, was opened at 2002-08-02 16:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590377&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: Invalid Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: db4 include not found Initial Comment: setup.py looks for the db4 library in /usr/lib, but doesn't look for the header in /usr/include (as you find it on Debian unstable) ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-11-01 08:02 Message: Logged In: YES user_id=44345 > so why not use /etc/debian_version? That's fine. I didn't know it existed, having had no previous experience with Debian. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-01 03:41 Message: Logged In: YES user_id=21627 Would you like to update your patch to implement that strategy? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-11-01 01:58 Message: Logged In: YES user_id=60903 so why not use /etc/debian_version? the dpkg thing is useful to determine the correct db version, or you grep the include file for the version number. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-31 16:24 Message: Logged In: YES user_id=44345 I was thinking more along the lines of a file we could look for. For example, on my Mandrake systems I have /etc/mandrake-release. I suspect that pretty much rules out other Linux dialects (though what if someone converts to another version but doesn't format their disks?). Testing the presence of dpkg isn't sufficient either. Other systems can use it. In fact, I have dpkg on my MacOSX system as a side-effect of being a fink user (http://fink.sf.net/). ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-31 16:13 Message: Logged In: YES user_id=60903 two possibilities come to mind: dpkg -S /usr/include/db.h prints: libdb4.0-dev: /usr/include/db.h (or libdb3-dev, if the libdb3-dev package is installed). Note that the installation of a libdb-dev package is not required, so this command may return with an error. I'm pretty sure, that dpkg only exists on Debian systems ;-) or else test for the existance of the file /etc/debian_version first. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-27 09:46 Message: Logged In: YES user_id=44345 It's clear we can't blindly add '/usr/include' to db_try_this['db4']['incdirs'] and 'db' to db_try_this['db4']['libs']. Is there some way to unambiguously detect from setup.py that Python is being built on a Debian system and that we are not dealing with an installation of db1 (which I still refuse to enable without manual intervention)? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-27 09:12 Message: Logged In: YES user_id=21627 /usr/include/db.h is not a symlink; it is the file itself. You cannot have multiple bsddb development packages (libdbX-dev) installed, because they conflict with each other. /usr/lib/libdb.so exists and is a symlink to the installed shared library. The file in question isn't actually db.h (for current bsddbmodule.c), but db_185.h, of course. As for base+patch: Sure, Debian already uses such a patch. Matthias is the Debian maintainer of Python, and asks us (as upstream) to include his patch. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-27 08:39 Message: Logged In: YES user_id=44345 Is /usr/include/db.h a symlink to some other file on Debian which is version-specific? If so, I'd be happy to add that directory to the list searched. How does Debian structure its directories to allow multiple versions of Berkeley db to be installed simultaneously? If /usr/include/db.h is the location of the include file, is /usr/lib/libdb.{so,a} the location of the library? The original (db1) versions of Berkeley DB generally had db.h in /usr/include. This version is, unfortunately, both broken and still in use. Other vendors allow multiple versions of the library to be installed, and use a version-specific directory naming scheme to keep things sorted out. Debian could do the same. No matter how strongly you believe /usr/include should be searched, I'm not going to add it by default and risk the chance that other peoples' installs break (silently) as a result. Please read the comments related to db1 in setup.py. (Search for bsddb.) Final thought here... Doesn't Debian have and base+patch sort of system? To install on Debian, all that would need to be done is develop a Debian-specific patch which adds /usr/include to the incdirs key and something like 'db' to the libs key in the db4 section. Skip ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-21 17:11 Message: Logged In: YES user_id=21627 As a bug report, I would close this as "Won't fix", pointing you to the option of not using setup.py, but compiling the module through Modules/Setup. I'd prefer to get a patch, but that should be one that works on all systems. This one breaks on systems where /usr/include/db.h is not a bsddb 4 header file. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-21 16:54 Message: Logged In: YES user_id=60903 > - Doesn't it cause -I/usr/include be added to the > compilation statement? yes, that's not correct > - Why is that done for db4 and not db3? On Debian only one libdb-dev package can be installed at build time. I assure with the build dependencies that I get the correct one. > - How can we know what bsddb version /usr/include/db.h > belongs to? Guessing that it is db4 is error-prone, so it > might be, in fact, better not to find that file. maybe grep for DB_VERSION_MAJOR in the file? I admit, this issue should be a bug report, not a patch ... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-21 16:42 Message: Logged In: YES user_id=21627 Ok, I admit that the patch might be needed; I think it is quite incorrect, though: - Doesn't it cause -I/usr/include be added to the compilation statement? - Why is that done for db4 and not db3? - How can we know what bsddb version /usr/include/db.h belongs to? Guessing that it is db4 is error-prone, so it might be, in fact, better not to find that file. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-10-21 16:14 Message: Logged In: YES user_id=60903 Martin, you're closing reports very quickly ;-) What makes you think the problem is solved? The code doesn't search /usr/include for db.h. Updated patch included. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-13 08:06 Message: Logged In: YES user_id=21627 The patch is out-of-date, and appears to be unnecessary. I'm rejecting it - if you still think something needs to be done, please submit a new patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590377&group_id=5470 From noreply@sourceforge.net Fri Nov 1 17:04:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 09:04:13 -0800 Subject: [Patches] [ python-Patches-631972 ] Platform test for jython Message-ID: Patches item #631972, was opened at 2002-11-01 04:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 Category: Tests Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Finn Bock (bckfnn) >Assigned to: Finn Bock (bckfnn) Summary: Platform test for jython Initial Comment: Some tests (and more to come) contains a test for the java platform. I suggest adding a 'is_jython' flag to test_support to make it easier to read tests with jython specific logic. Is this change possible for inclusion in a 2.2.x release? Since jython is running against a 2.2 base, backporting fixes to the test suite would be very convenient for us. I can commit it myself if it is accepted. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-01 12:04 Message: Logged In: YES user_id=31435 Accepted, and back to you. We don't backport "new features", but I've got no problem with backporting better tests -- to the contrary, I think it's a good idea too. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 From noreply@sourceforge.net Fri Nov 1 17:14:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 09:14:18 -0800 Subject: [Patches] [ python-Patches-624180 ] HTTP Auth support for xmlrpclib Message-ID: Patches item #624180, was opened at 2002-10-16 18:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=624180&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: Fixed Priority: 5 Submitted By: Phillip J. Eby (pje) Assigned to: Fredrik Lundh (effbot) Summary: HTTP Auth support for xmlrpclib Initial Comment: This patch adds code and docs to support the use of HTTP/HTTPS "Basic Authorization" with xmlrpclib. It does this using the "http://user:pass@host:port/" syntax, based on existing code in urllib and xmlrpclib. I have tested the code change with and without basic auth on my own servers, but do not know how I could create a reasonable addition to the test suite, since a server connection would be required. My patch includes a patch to Doc/lib/libxmlrpclib.tex, but I'm not sure if I added the text in the best place. It may be that it should be a note in the 'seealso' section on that page instead, but I wasn't sure if that was correct style for the library documentation. For the code patch, I tried to match the style of the surrounding code as closely as possible, including the style used for code comments. The patch is against current (as of this moment) Python CVS. Please let me know if there's any other info you need or anything that I should change for this. Thanks. ---------------------------------------------------------------------- Comment By: Phillip J. Eby (pje) Date: 2002-10-23 19:23 Message: Logged In: YES user_id=56214 Oops... There's a (very small) bug in both my and your versions of the patch. I forgot to do 'urllib.unquote(auth)' before doing the base64 encode. This means that if 'user' is an e-mail address or contains a colon, or 'password' contains an '@' sign, there's no way to get them into the auth string. adding: auth = urllib.unquote(auth) before the 'auth = base64.encodestring(auth)' statement does the trick. Thus, a 'user' of "pje@telecommunity.com" can be encoded in the URL as "pje%40telecommunity.com", which is awkward but usable. I discovered this when trying to port a script from Perl's RPC::XML, which supports this syntax, escapes included. Thanks! ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-10-20 14:54 Message: Logged In: YES user_id=38376 looks good to me. thanks /F ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=624180&group_id=5470 From noreply@sourceforge.net Fri Nov 1 18:04:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 10:04:34 -0800 Subject: [Patches] [ python-Patches-631972 ] Platform test for jython Message-ID: Patches item #631972, was opened at 2002-11-01 10:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 Category: Tests Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Finn Bock (bckfnn) Assigned to: Finn Bock (bckfnn) Summary: Platform test for jython Initial Comment: Some tests (and more to come) contains a test for the java platform. I suggest adding a 'is_jython' flag to test_support to make it easier to read tests with jython specific logic. Is this change possible for inclusion in a 2.2.x release? Since jython is running against a 2.2 base, backporting fixes to the test suite would be very convenient for us. I can commit it myself if it is accepted. ---------------------------------------------------------------------- >Comment By: Finn Bock (bckfnn) Date: 2002-11-01 19:04 Message: Logged In: YES user_id=4201 Applied to test_support.py: 1.43; ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-01 18:04 Message: Logged In: YES user_id=31435 Accepted, and back to you. We don't backport "new features", but I've got no problem with backporting better tests -- to the contrary, I think it's a good idea too. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 From noreply@sourceforge.net Fri Nov 1 19:14:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Nov 2002 11:14:20 -0800 Subject: [Patches] [ python-Patches-631972 ] Platform test for jython Message-ID: Patches item #631972, was opened at 2002-11-01 10:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 Category: Tests Group: None Status: Closed Resolution: Fixed Priority: 5 Submitted By: Finn Bock (bckfnn) Assigned to: Finn Bock (bckfnn) Summary: Platform test for jython Initial Comment: Some tests (and more to come) contains a test for the java platform. I suggest adding a 'is_jython' flag to test_support to make it easier to read tests with jython specific logic. Is this change possible for inclusion in a 2.2.x release? Since jython is running against a 2.2 base, backporting fixes to the test suite would be very convenient for us. I can commit it myself if it is accepted. ---------------------------------------------------------------------- >Comment By: Finn Bock (bckfnn) Date: 2002-11-01 20:14 Message: Logged In: YES user_id=4201 And applied to test_support.py: 1.40.8.1 in release22-maint. ---------------------------------------------------------------------- Comment By: Finn Bock (bckfnn) Date: 2002-11-01 19:04 Message: Logged In: YES user_id=4201 Applied to test_support.py: 1.43; ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-01 18:04 Message: Logged In: YES user_id=31435 Accepted, and back to you. We don't backport "new features", but I've got no problem with backporting better tests -- to the contrary, I think it's a good idea too. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631972&group_id=5470 From noreply@sourceforge.net Sat Nov 2 16:29:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Nov 2002 08:29:39 -0800 Subject: [Patches] [ python-Patches-632624 ] fix test_resource failure on alpha/64bit Message-ID: Patches item #632624, was opened at 2002-11-02 11:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632624&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: fix test_resource failure on alpha/64bit Initial Comment: resource_getrlimit() in Modules/resource.c returns a tuple of 2 integers. On the alpha (and probably any 64 bit machine), the value is converted from a long to a PyInt. The value changed from 9223372036854775807 to -1. The fix is very simple and thus shown inline (Modules/resource.c:135): - return Py_BuildValue("ii", (long) rl.rlim_cur, (long) rl.rlim_max); + return Py_BuildValue("ll", (long) rl.rlim_cur, (long) rl.rlim_max); I'd just check this in, but I wanted to make sure there wouldn't be any problems changing the return value from PyInts to PyLongs. This seems ok to me given long integer unification. The test runs on both Linux/alpha and Linux/x86. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632624&group_id=5470 From noreply@sourceforge.net Sat Nov 2 17:28:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Nov 2002 09:28:53 -0800 Subject: [Patches] [ python-Patches-550765 ] SocketServer behavior Message-ID: Patches item #550765, was opened at 2002-04-30 22:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550765&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Gilfix (mgilfix) >Assigned to: Martin v. Löwis (loewis) Summary: SocketServer behavior Initial Comment: A bug, or lack of behavior in ServerSocket.py was exposed while created a unit test for Python 2.3. As of 2.3, signals between threads propagate differently. When cancelling a server that implemented threading with a keyboard interrupt, the server would shut down but not terminate (waiting on client threads). The fit for this was to make the client threads daemon-threads with the setDaemon call. Because this was non-apparent, this patch adds a member variable which acts as a hook and makes it clear that clients need to either set or unset the variable when deriving the class to control this behavior. setDaemon is off by default, as this is thought to be the consensus behavior. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-04 10:31 Message: Logged In: YES user_id=21627 I recommend to approve this patch. ---------------------------------------------------------------------- Comment By: Michael Gilfix (mgilfix) Date: 2002-06-03 18:45 Message: Logged In: YES user_id=116038 All Address in the new patch. See SocketServer.py.patch.2. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-02 19:44 Message: Logged In: YES user_id=21627 Could you please provide changes to the documentation as well (i.e. Doc/lib/libsocksvr.tex, and Misc/NEWS)? There is a typo: 'chanegs' -> 'changes' There is no need to attach the modified source to the patch; the context diff is enough. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550765&group_id=5470 From noreply@sourceforge.net Sat Nov 2 17:29:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Nov 2002 09:29:25 -0800 Subject: [Patches] [ python-Patches-569328 ] names in types module Message-ID: Patches item #569328, was opened at 2002-06-15 11:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569328&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: names in types module Initial Comment: Adds names to types module so types are accessible as 'type.spam' in addition to the existing longer version 'types.SpamType'. The short names match the type's __name__ attribute. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-13 14:48 Message: Logged In: YES user_id=21627 Oren, I sympathise with Fredrik's view. Unless there is somebody speaking in favour of these changes, I'll reject this patch by November 1. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 20:15 Message: Logged In: YES user_id=38376 "from * import types" is a rather common pydiom, and I'm pretty sure most people using that expects to get a bunch of [A-Z]\w+Type names, and nothing else. -0 from me. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-06-18 16:40 Message: Logged In: YES user_id=562624 Updated patch. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-06-15 12:58 Message: Logged In: YES user_id=562624 http://mail.python.org/pipermail/python-dev/2002-June/025410.html ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-15 12:05 Message: Logged In: YES user_id=21627 What is the purpose of this change? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569328&group_id=5470 From noreply@sourceforge.net Sat Nov 2 17:31:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Nov 2002 09:31:07 -0800 Subject: [Patches] [ python-Patches-632643 ] Punycode encoding Message-ID: Patches item #632643, was opened at 2002-11-02 18:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632643&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Punycode encoding Initial Comment: This patch implements Punycode, http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt This will be used by the internationalized domain names. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632643&group_id=5470 From noreply@sourceforge.net Sat Nov 2 17:50:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Nov 2002 09:50:41 -0800 Subject: [Patches] [ python-Patches-632624 ] fix test_resource failure on alpha/64bit Message-ID: Patches item #632624, was opened at 2002-11-02 11:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632624&group_id=5470 Category: Core (C code) >Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Neal Norwitz (nnorwitz) Summary: fix test_resource failure on alpha/64bit Initial Comment: resource_getrlimit() in Modules/resource.c returns a tuple of 2 integers. On the alpha (and probably any 64 bit machine), the value is converted from a long to a PyInt. The value changed from 9223372036854775807 to -1. The fix is very simple and thus shown inline (Modules/resource.c:135): - return Py_BuildValue("ii", (long) rl.rlim_cur, (long) rl.rlim_max); + return Py_BuildValue("ll", (long) rl.rlim_cur, (long) rl.rlim_max); I'd just check this in, but I wanted to make sure there wouldn't be any problems changing the return value from PyInts to PyLongs. This seems ok to me given long integer unification. The test runs on both Linux/alpha and Linux/x86. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-02 12:50 Message: Logged In: YES user_id=33168 Checked in as: * Misc/NEWS; 1.505 * Modules/resource.c; 2.28 Not backporting since this is a slight change. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-02 12:34 Message: Logged In: YES user_id=21627 I think this is ok, given that resource isn't that much used in the first place. Please make sure to add a comment in NEWS. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632624&group_id=5470 From noreply@sourceforge.net Sat Nov 2 17:34:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Nov 2002 09:34:47 -0800 Subject: [Patches] [ python-Patches-632624 ] fix test_resource failure on alpha/64bit Message-ID: Patches item #632624, was opened at 2002-11-02 17:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632624&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: fix test_resource failure on alpha/64bit Initial Comment: resource_getrlimit() in Modules/resource.c returns a tuple of 2 integers. On the alpha (and probably any 64 bit machine), the value is converted from a long to a PyInt. The value changed from 9223372036854775807 to -1. The fix is very simple and thus shown inline (Modules/resource.c:135): - return Py_BuildValue("ii", (long) rl.rlim_cur, (long) rl.rlim_max); + return Py_BuildValue("ll", (long) rl.rlim_cur, (long) rl.rlim_max); I'd just check this in, but I wanted to make sure there wouldn't be any problems changing the return value from PyInts to PyLongs. This seems ok to me given long integer unification. The test runs on both Linux/alpha and Linux/x86. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-02 18:34 Message: Logged In: YES user_id=21627 I think this is ok, given that resource isn't that much used in the first place. Please make sure to add a comment in NEWS. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632624&group_id=5470 From noreply@sourceforge.net Sun Nov 3 16:50:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 08:50:48 -0800 Subject: [Patches] [ python-Patches-632934 ] Problem at the end of misformed mailbox Message-ID: Patches item #632934, was opened at 2002-11-03 17:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632934&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Rémi Peyronnet (rpeyron) Assigned to: Nobody/Anonymous (nobody) Summary: Problem at the end of misformed mailbox Initial Comment: I had a problem with a not well formed mailbox (maybe ambiguous carriage return chars, due to both use under windows and linux) : the function mailbox.readlines (lib/mailbox.py:66) entered in an indefinite cycle. I found that the self.stop value was too big for the file, and that the index of self.pos could not go that far. The function readlines will call ever and ever readline, which will return always the same 1-length string. I solved this by comparing the fp.pos before and after the read operation. If it the same, we re probably at the end of the file, or there is a problem, and we should go out. As I do not know much about the Python internals, the following patch may not be good : C:\Python222\Lib>diff "Copie de mailbox.py" mailbox.py 63c63,68 < self.pos = self.fp.tell() --- > data = self.fp.readline(length) > self_fp_tell = self.fp.tell() > if self.pos == self_fp_tell: > return '' > else: > self.pos = self_fp_tell Regards ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632934&group_id=5470 From noreply@sourceforge.net Sun Nov 3 19:59:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 11:59:52 -0800 Subject: [Patches] [ python-Patches-632973 ] _getdefaultlocale for OS X Message-ID: Patches item #632973, was opened at 2002-11-03 20:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632973&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: Jack Jansen (jackjansen) Summary: _getdefaultlocale for OS X Initial Comment: This patch implements getdefaultlocale using CFStringGetSystemEncoding. It consists of three parts: - check for __APPLE__, not macintosh, in _localemodule.c, to support Darwin. I assume macintosh is defined by the non-Darwin builds; Darwin currently does not implement getdefaultlocale. I also assume that __APPLE__ is defined on all Mac builds. - return string literals for a few well-known encodings (in particular the ones for which we have codecs). - return the IANA charset name for all others. This will usually be some X- string, since Apple hasn't registered any of the charsets. If extension modules support those encodings, they need to support those names in the lookup functions as well. The names returned by CFStringGetNameOfEncoding are useless as they contain spaces, and other punctuation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632973&group_id=5470 From noreply@sourceforge.net Sun Nov 3 20:39:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 12:39:21 -0800 Subject: [Patches] [ python-Patches-619475 ] C3 MRO algorithm implementation Message-ID: Patches item #619475, was opened at 2002-10-07 05:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=619475&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Samuele Pedroni (pedronis) Assigned to: Guido van Rossum (gvanrossum) Summary: C3 MRO algorithm implementation Initial Comment: At least is a beginning. On Linux all tests and the modified test_descr.py pass. A few cases in test_descr.py are commented out, maybe they should be adjusted, reconstructed. For order disagreement situations: backup logic picking an element from the first non-empty list and removing it from all lists, where it appears, and just throwing a warning instead of an exception could be put where I set the exception now. Although in the long run people should learn to use consistent hiearchies anyway. PS: I was wondering how to get/reuse lst.remove(o) functionality from C, apart through PyObject_CallMethod... ---------------------------------------------------------------------- >Comment By: Samuele Pedroni (pedronis) Date: 2002-11-03 21:39 Message: Logged In: YES user_id=61408 FYI, I will be off-line from 4 to 16 Nov. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=619475&group_id=5470 From noreply@sourceforge.net Sun Nov 3 21:44:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 13:44:41 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 16:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Sun Nov 3 22:32:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 14:32:50 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 22:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 23:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Sun Nov 3 23:11:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 15:11:41 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 22:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 00:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 23:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 00:54:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 16:54:44 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 16:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-03 19:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 18:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 17:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 07:15:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Nov 2002 23:15:14 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 22:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 08:15 Message: Logged In: YES user_id=21627 I now have an account on the testdrive machine (as you just referred to them). The -1 is not coming from yp_all; foreach is creating it itself, since fix is true. There appears to be something strange with mail.aliases: On some systems, keys and values have a null byte, on others, they don't. In JNDI, Sun could not solve this in any other way but defining a property com.sun.jndi.nis.mailaliases which indicates whether a null byte should be assumed to be there, see http://www-iiuf.unifr.ch/iiufdev/doc/public/jndi/providers/jndi-nis.html I would then suggest a different strategy: If fix is set, and both the key and the value have a terminating null included in their length, ignore that. Otherwise, copy all bytes into key and value. I'm still uncertain what the '':'' pair is supposed to indicate, but it appears that mail.aliases deliberately includes "invalid" entries. For example, sendmail generates a "@":"@" entry into the aliases file; if this entry is absent, sendmail assumes that the file is truncated. Perhaps the convention on HP-UX is that the invalid entry consists of an empty string pair. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 01:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 00:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 23:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 09:56:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 01:56:54 -0800 Subject: [Patches] [ python-Patches-630829 ] telnetlib.py: don't block on IAC and enhancement Message-ID: Patches item #630829, was opened at 2002-10-30 02:35 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=630829&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 7 Submitted By: Ha Shao (hashao) Assigned to: Nobody/Anonymous (nobody) Summary: telnetlib.py: don't block on IAC and enhancement Initial Comment: Use a IAC buffer to make IAC commands not block. Also call callback on command other than WILL/WONT/DO/DONT. People still want to handle other commands. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 10:56 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as libtelnetlib.tex 1.10; telnetlib.py 1.20; ACKS 1.214; NEWS 1.507; ---------------------------------------------------------------------- Comment By: Ha Shao (hashao) Date: 2002-10-31 05:17 Message: Logged In: YES user_id=8717 A new patch also handle IAC SB ... IAC SE data fetchming for fully handling the telnet protocol. No option will use chr(0) instead of 256. protocol handler should know if a command has option or not, anyway. Supercede the last patch. Please commit. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=630829&group_id=5470 From noreply@sourceforge.net Mon Nov 4 13:10:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 05:10:34 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 16:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 08:10 Message: Logged In: YES user_id=33168 One thing to note: I believe I originally found this problem on the Alphas (Tru64) even before the merger with HP. So this problem could deal more with the NIS configuration. I'm not sure I understand, do you want something like this: if (indata->fix) { if (data[datalen] != '\0') datalen--; } Where data/datalen would be done for both inkey and inval? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 02:15 Message: Logged In: YES user_id=21627 I now have an account on the testdrive machine (as you just referred to them). The -1 is not coming from yp_all; foreach is creating it itself, since fix is true. There appears to be something strange with mail.aliases: On some systems, keys and values have a null byte, on others, they don't. In JNDI, Sun could not solve this in any other way but defining a property com.sun.jndi.nis.mailaliases which indicates whether a null byte should be assumed to be there, see http://www-iiuf.unifr.ch/iiufdev/doc/public/jndi/providers/jndi-nis.html I would then suggest a different strategy: If fix is set, and both the key and the value have a terminating null included in their length, ignore that. Otherwise, copy all bytes into key and value. I'm still uncertain what the '':'' pair is supposed to indicate, but it appears that mail.aliases deliberately includes "invalid" entries. For example, sendmail generates a "@":"@" entry into the aliases file; if this entry is absent, sendmail assumes that the file is truncated. Perhaps the convention on HP-UX is that the invalid entry consists of an empty string pair. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-03 19:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 18:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 17:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 13:19:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 05:19:11 -0800 Subject: [Patches] [ python-Patches-473586 ] SimpleXMLRPCServer - fixes and CGI Message-ID: Patches item #473586, was opened at 2001-10-22 07:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=473586&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Fredrik Lundh (effbot) Summary: SimpleXMLRPCServer - fixes and CGI Initial Comment: Changes: o treats xmlrpclib.Fault's correctly (no longer absorbes them as generic exceptions) o changed failed marshal to generate a useful Fault instead of an internal server error o adds a new class to make writing XML-RPC functions embedded in other servers, using CGI, easier (tested with APACHE) o to support the above, added a new dispatch helper class SimpleXMLRPCDispatcher ---------------------------------------------------------------------- Comment By: Robin Becker (rgbecker) Date: 2002-11-04 13:19 Message: Logged In: YES user_id=6946 Thanks I have applied the v5 patch and it seems fine, I suppose it is probably better to use the patch rather than stick with Brian's old code as I guess it will gradually get more and more out of date. Perhaps all the old introspection stuff belongs in a cookbook entry? ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-10-31 23:46 Message: Logged In: YES user_id=108973 Martin, I don't have a lot of bandwidth right now but I'll try to do that soon. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-26 15:23 Message: Logged In: YES user_id=21627 Brian, the patch looks good to me. However, can you please also supply patches to Doc/lib/libsimplexmlrpc? ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-18 19:41 Message: Logged In: YES user_id=108973 OK, I fixed the backwards compatibility problem. Also added: o support for the XML-RPC introspection methods system.listMethods and system.methodHelp o support for the XML-RPC boxcaring method system.multicall ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-12-04 19:51 Message: Logged In: YES user_id=108973 Please do not accept this patch past 2.2 release; there are so non-backwards compatible changes that need to be though through. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-23 18:02 Message: Logged In: YES user_id=108973 - a few extra comments - moved a xmlrpclib.loads() inside an exception handler so an XML-RPC fault is generated for malformed requests ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-22 18:59 Message: Logged In: YES user_id=108973 The advantage of the entire patch being accepted before 2.2 is that there is an API change and, once 2.2 is release, we will probably have to make a bit of an attempt to maintain backwards compatibility. If this patch is too high-risk for 2.2 then I can certainly design a bug-fix patch for 2.2 and submit a new patch for 2.3 (that is API compatible with 2.2). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2001-10-22 18:43 Message: Logged In: YES user_id=21627 Brian, please note that Python 2.2b1 has been released, so no new features are acceptable until 2.2. So unless Fredrik Lundh wants to accept your entire patch, I think it has little chance to get integrated for the next few months. If you want pieces of it accepted, I'd recommend to split it into bug fixes and new features; bug fixes are still acceptable. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-22 18:27 Message: Logged In: YES user_id=108973 I just can't stop mucking with it. This time there are only documentation changes. I should also have pointed out that this patch changes the mechanism for overriding the dispatch mechanism: you used to subclass the request handler, now you subclass the server. I believe that this change is correct because the server actually has the required state information to do the dispatching. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-22 07:35 Message: Logged In: YES user_id=108973 Changed a name to fit other naming conventions ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=473586&group_id=5470 From noreply@sourceforge.net Mon Nov 4 13:41:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 05:41:27 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 22:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 14:41 Message: Logged In: YES user_id=21627 I was thinking of if (indata->fix && inkeylen >0 && invaluelen>0 && data[inkeylen-1] == '\0' && value[invaluelen-1] == '\0){ inkeylen--; invaluelen--; } That there is a '':'' entry in mail.aliases might be a bug in the NIS configuration of testdrive. However, the problem with "sometimes it the length includes the null, sometimes not" is probably independent from this specific installation. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 14:10 Message: Logged In: YES user_id=33168 One thing to note: I believe I originally found this problem on the Alphas (Tru64) even before the merger with HP. So this problem could deal more with the NIS configuration. I'm not sure I understand, do you want something like this: if (indata->fix) { if (data[datalen] != '\0') datalen--; } Where data/datalen would be done for both inkey and inval? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 08:15 Message: Logged In: YES user_id=21627 I now have an account on the testdrive machine (as you just referred to them). The -1 is not coming from yp_all; foreach is creating it itself, since fix is true. There appears to be something strange with mail.aliases: On some systems, keys and values have a null byte, on others, they don't. In JNDI, Sun could not solve this in any other way but defining a property com.sun.jndi.nis.mailaliases which indicates whether a null byte should be assumed to be there, see http://www-iiuf.unifr.ch/iiufdev/doc/public/jndi/providers/jndi-nis.html I would then suggest a different strategy: If fix is set, and both the key and the value have a terminating null included in their length, ignore that. Otherwise, copy all bytes into key and value. I'm still uncertain what the '':'' pair is supposed to indicate, but it appears that mail.aliases deliberately includes "invalid" entries. For example, sendmail generates a "@":"@" entry into the aliases file; if this entry is absent, sendmail assumes that the file is truncated. Perhaps the convention on HP-UX is that the invalid entry consists of an empty string pair. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 01:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 00:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 23:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 14:14:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 06:14:14 -0800 Subject: [Patches] [ python-Patches-618791 ] [mingw patches] alloca and posixmodule Message-ID: Patches item #618791, was opened at 2002-10-05 01:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None >Priority: 2 Submitted By: Gerhard Häring (ghaering) Assigned to: Guido van Rossum (gvanrossum) Summary: [mingw patches] alloca and posixmodule Initial Comment: This is the first patch in a series of patching of porting Python to native win32, while still using the autoconf-based build process. The compiler used is mingw, the build environment used is msys, a stripped down Cygwin from the mingw project. This patch does several things: * change _alloca to alloca for both mingw and Visual C++, to avoid unnecessary #ifdef-ing. * Change the makesetup shell script to work for win32, where for some weird reason we have a module 'nt' built from a posixmodule.c file. * Change on occurence of #ifdef MS_WINDOWS in posixmodule.c where it should really have been #ifdef Py_WIN_WIDE_FILENAMES * Change the #ifdefs in posixmodule.c so that it can be built with both MSVC and mingw The result of this patch is that we can build a statically built python.exe with a simple ./configure make under mingw/msys. There's, however, still of additional work to do until we can build a native win32 Python with the autoconf-based build process. Please apply this ASAP, as I want to avoid having a diverging Python tree on my harddisk (this makes patch creation a lot more difficult). ---------------------------------------------------------------------- >Comment By: Gerhard Häring (ghaering) Date: 2002-11-04 15:14 Message: Logged In: YES user_id=163326 Guido, if you think that there should either be one big patch that enables Python to be built with mingw or nothing at all, then please close this as 'rejected' or whatever. There are good reasons for doing so, just as there are arguments for incremental patches, like I described above. I won't feel offended, especially as I know how annoying it is for myself to have a SF entry page full of this kind of patches/bugs :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 From noreply@sourceforge.net Mon Nov 4 15:16:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 07:16:14 -0800 Subject: [Patches] [ python-Patches-618135 ] gzip.py and files > 2G Message-ID: Patches item #618135, was opened at 2002-10-03 12:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) >Assigned to: A.M. Kuchling (akuchling) Summary: gzip.py and files > 2G Initial Comment: Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2002-10-04 03:36 Message: Logged In: YES user_id=537938 Sorry -- it seems the file upload went wrong! Second try. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 From noreply@sourceforge.net Mon Nov 4 15:18:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 07:18:19 -0800 Subject: [Patches] [ python-Patches-588809 ] LDFLAGS support for build_ext.py Message-ID: Patches item #588809, was opened at 2002-07-30 17:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Robert Weber (chipsforbrains) >Assigned to: A.M. Kuchling (akuchling) Summary: LDFLAGS support for build_ext.py Initial Comment: a hack at best ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-07 04:27 Message: Logged In: YES user_id=21627 The patch looks fine to me, but I'd like to hear the opinion of a distutils guru. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-08-06 15:35 Message: Logged In: YES user_id=245624 > As a hack, I think it is unacceptable for Python. > >I'd encourage you to integrate this (and CFLAGS) into >sysconfig.customize_compiler. > >It would be ok if only the Unix compiler honors those >settings for now. > Martin v. Löwis (loewis) I have written a better patch to sysconfig.py that doe all others so that everything works like autoconf. I will post the patch in a sec.s CFLAGS and ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 05:05 Message: Logged In: YES user_id=21627 As a hack, I think it is unacceptable for Python. I'd encourage you to integrate this (and CFLAGS) into sysconfig.customize_compiler. It would be ok if only the Unix compiler honors those settings for now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 From noreply@sourceforge.net Mon Nov 4 16:48:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 08:48:10 -0800 Subject: [Patches] [ python-Patches-633359 ] Patch for sre bug 610299 Message-ID: Patches item #633359, was opened at 2002-11-04 07:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Patch for sre bug 610299 Initial Comment: Bug report 610299 points out this discrepancy: >>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXXX' >>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXX\xe9' The problem is in sre_compile.py: the call to _compile_charset near the end of _compile_info forgets to pass in the flags, so that the info charset is not compiled with re.U. (The info charset is used when searching to find the first character at which a match could start; it is not generated for patterns beginning with a repeat like '\w{1}'.) The attached patch changes this call to pass in the flags; it is against the 2.2.2 version of sre_compile.py. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 From noreply@sourceforge.net Mon Nov 4 17:15:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 09:15:48 -0800 Subject: [Patches] [ python-Patches-633374 ] nondestructive dick.popitem and Set.pop Message-ID: Patches item #633374, was opened at 2002-11-04 11:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Williams (johnw42) Assigned to: Nobody/Anonymous (nobody) Summary: nondestructive dick.popitem and Set.pop Initial Comment: This patch (relative to the latest Python CVS tree) adds a "pickitem" method to the builtin dict class and a "pick" method to the BaseSet class. These methods are analogs of "dict.popitem" and "Set.pop", but they don't remove the item they return from the dict/set. This patch *does not* update the documentation. This is my system: Linux 2.4.2-2 #1 i686 unknown ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 From noreply@sourceforge.net Mon Nov 4 17:08:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 09:08:36 -0800 Subject: [Patches] [ python-Patches-618135 ] gzip.py and files > 2G Message-ID: Patches item #618135, was opened at 2002-10-03 12:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) >Assigned to: Tim Peters (tim_one) Summary: gzip.py and files > 2G Initial Comment: Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-04 12:08 Message: Logged In: YES user_id=31435 Assigned to me. I think your suggested fix makes good sense. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2002-10-04 03:36 Message: Logged In: YES user_id=537938 Sorry -- it seems the file upload went wrong! Second try. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 From noreply@sourceforge.net Mon Nov 4 17:43:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 09:43:10 -0800 Subject: [Patches] [ python-Patches-633359 ] Patch for sre bug 610299 Message-ID: Patches item #633359, was opened at 2002-11-04 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Patch for sre bug 610299 Initial Comment: Bug report 610299 points out this discrepancy: >>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXXX' >>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXX\xe9' The problem is in sre_compile.py: the call to _compile_charset near the end of _compile_info forgets to pass in the flags, so that the info charset is not compiled with re.U. (The info charset is used when searching to find the first character at which a match could start; it is not generated for patterns beginning with a repeat like '\w{1}'.) The attached patch changes this call to pass in the flags; it is against the 2.2.2 version of sre_compile.py. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 18:43 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 From noreply@sourceforge.net Mon Nov 4 18:15:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 10:15:19 -0800 Subject: [Patches] [ python-Patches-588809 ] LDFLAGS support for build_ext.py Message-ID: Patches item #588809, was opened at 2002-07-30 17:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Robert Weber (chipsforbrains) Assigned to: A.M. Kuchling (akuchling) Summary: LDFLAGS support for build_ext.py Initial Comment: a hack at best ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 13:15 Message: Logged In: YES user_id=11375 It mostly looks fine to me, too. One question: the branch for CFLAGS adds the value of CFLAGS to the shared linker invocation, which seems incorrect. Why? (And does autoconf also do this? If autoconf does this, it's probably for some reason and we should therefore also do it.) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-07 04:27 Message: Logged In: YES user_id=21627 The patch looks fine to me, but I'd like to hear the opinion of a distutils guru. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-08-06 15:35 Message: Logged In: YES user_id=245624 > As a hack, I think it is unacceptable for Python. > >I'd encourage you to integrate this (and CFLAGS) into >sysconfig.customize_compiler. > >It would be ok if only the Unix compiler honors those >settings for now. > Martin v. Löwis (loewis) I have written a better patch to sysconfig.py that doe all others so that everything works like autoconf. I will post the patch in a sec.s CFLAGS and ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 05:05 Message: Logged In: YES user_id=21627 As a hack, I think it is unacceptable for Python. I'd encourage you to integrate this (and CFLAGS) into sysconfig.customize_compiler. It would be ok if only the Unix compiler honors those settings for now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 From noreply@sourceforge.net Mon Nov 4 18:28:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 10:28:24 -0800 Subject: [Patches] [ python-Patches-633359 ] Patch for sre bug 610299 Message-ID: Patches item #633359, was opened at 2002-11-04 07:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Patch for sre bug 610299 Initial Comment: Bug report 610299 points out this discrepancy: >>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXXX' >>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXX\xe9' The problem is in sre_compile.py: the call to _compile_charset near the end of _compile_info forgets to pass in the flags, so that the info charset is not compiled with re.U. (The info charset is used when searching to find the first character at which a match could start; it is not generated for patterns beginning with a repeat like '\w{1}'.) The attached patch changes this call to pass in the flags; it is against the 2.2.2 version of sre_compile.py. ---------------------------------------------------------------------- >Comment By: Greg Chapman (glchapman) Date: 2002-11-04 09:28 Message: Logged In: YES user_id=86307 Sorry, I though I marked the checkbox (I know I went throught the browse button to find the file). Anyway, here's the file. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 08:43 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 From noreply@sourceforge.net Mon Nov 4 18:33:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 10:33:50 -0800 Subject: [Patches] [ python-Patches-588809 ] LDFLAGS support for build_ext.py Message-ID: Patches item #588809, was opened at 2002-07-30 21:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Robert Weber (chipsforbrains) Assigned to: A.M. Kuchling (akuchling) Summary: LDFLAGS support for build_ext.py Initial Comment: a hack at best ---------------------------------------------------------------------- >Comment By: Robert Weber (chipsforbrains) Date: 2002-11-04 18:33 Message: Logged In: YES user_id=245624 I followed autoconf, where the linker includes CFLAGS, CPPFLAGS, and LDFLAGS. I assume they had a good reason to do this. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 18:15 Message: Logged In: YES user_id=11375 It mostly looks fine to me, too. One question: the branch for CFLAGS adds the value of CFLAGS to the shared linker invocation, which seems incorrect. Why? (And does autoconf also do this? If autoconf does this, it's probably for some reason and we should therefore also do it.) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-07 08:27 Message: Logged In: YES user_id=21627 The patch looks fine to me, but I'd like to hear the opinion of a distutils guru. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-08-06 19:35 Message: Logged In: YES user_id=245624 > As a hack, I think it is unacceptable for Python. > >I'd encourage you to integrate this (and CFLAGS) into >sysconfig.customize_compiler. > >It would be ok if only the Unix compiler honors those >settings for now. > Martin v. Löwis (loewis) I have written a better patch to sysconfig.py that doe all others so that everything works like autoconf. I will post the patch in a sec.s CFLAGS and ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 09:05 Message: Logged In: YES user_id=21627 As a hack, I think it is unacceptable for Python. I'd encourage you to integrate this (and CFLAGS) into sysconfig.customize_compiler. It would be ok if only the Unix compiler honors those settings for now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 From noreply@sourceforge.net Mon Nov 4 19:07:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 11:07:26 -0800 Subject: [Patches] [ python-Patches-633425 ] bz2 compression module Message-ID: Patches item #633425, was opened at 2002-11-04 19:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633425&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 3 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: bz2 compression module Initial Comment: As discussed in python-dev, here is the patch implementing the bz2 module, including comrehensive documentation and tests. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633425&group_id=5470 From noreply@sourceforge.net Mon Nov 4 19:43:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 11:43:03 -0800 Subject: [Patches] [ python-Patches-588809 ] LDFLAGS support for build_ext.py Message-ID: Patches item #588809, was opened at 2002-07-30 17:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open >Resolution: Accepted Priority: 5 Submitted By: Robert Weber (chipsforbrains) Assigned to: A.M. Kuchling (akuchling) Summary: LDFLAGS support for build_ext.py Initial Comment: a hack at best ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 14:43 Message: Logged In: YES user_id=11375 Really? I suppose there might be platforms where this matters, like SGI with its -n32/-o32 switches for different binary formats. So, I have no objections to the patch; I'll check it in. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-11-04 13:33 Message: Logged In: YES user_id=245624 I followed autoconf, where the linker includes CFLAGS, CPPFLAGS, and LDFLAGS. I assume they had a good reason to do this. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 13:15 Message: Logged In: YES user_id=11375 It mostly looks fine to me, too. One question: the branch for CFLAGS adds the value of CFLAGS to the shared linker invocation, which seems incorrect. Why? (And does autoconf also do this? If autoconf does this, it's probably for some reason and we should therefore also do it.) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-07 04:27 Message: Logged In: YES user_id=21627 The patch looks fine to me, but I'd like to hear the opinion of a distutils guru. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-08-06 15:35 Message: Logged In: YES user_id=245624 > As a hack, I think it is unacceptable for Python. > >I'd encourage you to integrate this (and CFLAGS) into >sysconfig.customize_compiler. > >It would be ok if only the Unix compiler honors those >settings for now. > Martin v. Löwis (loewis) I have written a better patch to sysconfig.py that doe all others so that everything works like autoconf. I will post the patch in a sec.s CFLAGS and ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 05:05 Message: Logged In: YES user_id=21627 As a hack, I think it is unacceptable for Python. I'd encourage you to integrate this (and CFLAGS) into sysconfig.customize_compiler. It would be ok if only the Unix compiler honors those settings for now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 From noreply@sourceforge.net Mon Nov 4 19:51:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 11:51:11 -0800 Subject: [Patches] [ python-Patches-618135 ] gzip.py and files > 2G Message-ID: Patches item #618135, was opened at 2002-10-03 12:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Tim Peters (tim_one) Summary: gzip.py and files > 2G Initial Comment: Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-04 14:51 Message: Logged In: YES user_id=31435 Fixed, by related changes in Lib/gzip.py; new revision: 1.36 Misc/NEWS; new revision: 1.508 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-04 12:08 Message: Logged In: YES user_id=31435 Assigned to me. I think your suggested fix makes good sense. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2002-10-04 03:36 Message: Logged In: YES user_id=537938 Sorry -- it seems the file upload went wrong! Second try. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 From noreply@sourceforge.net Mon Nov 4 19:54:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 11:54:05 -0800 Subject: [Patches] [ python-Patches-588809 ] LDFLAGS support for build_ext.py Message-ID: Patches item #588809, was opened at 2002-07-30 17:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Robert Weber (chipsforbrains) Assigned to: A.M. Kuchling (akuchling) Summary: LDFLAGS support for build_ext.py Initial Comment: a hack at best ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 14:53 Message: Logged In: YES user_id=11375 Checked in as revision 1.87 of build_ext.py and revision 1.51 of sysconfig.py. Thanks! ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 14:43 Message: Logged In: YES user_id=11375 Really? I suppose there might be platforms where this matters, like SGI with its -n32/-o32 switches for different binary formats. So, I have no objections to the patch; I'll check it in. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-11-04 13:33 Message: Logged In: YES user_id=245624 I followed autoconf, where the linker includes CFLAGS, CPPFLAGS, and LDFLAGS. I assume they had a good reason to do this. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-11-04 13:15 Message: Logged In: YES user_id=11375 It mostly looks fine to me, too. One question: the branch for CFLAGS adds the value of CFLAGS to the shared linker invocation, which seems incorrect. Why? (And does autoconf also do this? If autoconf does this, it's probably for some reason and we should therefore also do it.) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-07 04:27 Message: Logged In: YES user_id=21627 The patch looks fine to me, but I'd like to hear the opinion of a distutils guru. ---------------------------------------------------------------------- Comment By: Robert Weber (chipsforbrains) Date: 2002-08-06 15:35 Message: Logged In: YES user_id=245624 > As a hack, I think it is unacceptable for Python. > >I'd encourage you to integrate this (and CFLAGS) into >sysconfig.customize_compiler. > >It would be ok if only the Unix compiler honors those >settings for now. > Martin v. Löwis (loewis) I have written a better patch to sysconfig.py that doe all others so that everything works like autoconf. I will post the patch in a sec.s CFLAGS and ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 05:05 Message: Logged In: YES user_id=21627 As a hack, I think it is unacceptable for Python. I'd encourage you to integrate this (and CFLAGS) into sysconfig.customize_compiler. It would be ok if only the Unix compiler honors those settings for now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470 From noreply@sourceforge.net Mon Nov 4 20:04:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 12:04:01 -0800 Subject: [Patches] [ python-Patches-618791 ] [mingw patches] alloca and posixmodule Message-ID: Patches item #618791, was opened at 2002-10-04 19:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 2 Submitted By: Gerhard Häring (ghaering) Assigned to: Guido van Rossum (gvanrossum) Summary: [mingw patches] alloca and posixmodule Initial Comment: This is the first patch in a series of patching of porting Python to native win32, while still using the autoconf-based build process. The compiler used is mingw, the build environment used is msys, a stripped down Cygwin from the mingw project. This patch does several things: * change _alloca to alloca for both mingw and Visual C++, to avoid unnecessary #ifdef-ing. * Change the makesetup shell script to work for win32, where for some weird reason we have a module 'nt' built from a posixmodule.c file. * Change on occurence of #ifdef MS_WINDOWS in posixmodule.c where it should really have been #ifdef Py_WIN_WIDE_FILENAMES * Change the #ifdefs in posixmodule.c so that it can be built with both MSVC and mingw The result of this patch is that we can build a statically built python.exe with a simple ./configure make under mingw/msys. There's, however, still of additional work to do until we can build a native win32 Python with the autoconf-based build process. Please apply this ASAP, as I want to avoid having a diverging Python tree on my harddisk (this makes patch creation a lot more difficult). ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-04 15:04 Message: Logged In: YES user_id=6380 What I'd like to see most is for somebody with CVS commit permission for Python *and* an understanding of mingw to start making the changes in Python's CVS. I'd be willing to give you CVS permission for this, if you're willing to work with python-dev regarding the acceptability of the various changes you're proposing. I presume you'll quickly get a sense for what kind of changes are non-controversional and can be checked in without asking. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-11-04 09:14 Message: Logged In: YES user_id=163326 Guido, if you think that there should either be one big patch that enables Python to be built with mingw or nothing at all, then please close this as 'rejected' or whatever. There are good reasons for doing so, just as there are arguments for incremental patches, like I described above. I won't feel offended, especially as I know how annoying it is for myself to have a SF entry page full of this kind of patches/bugs :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 From noreply@sourceforge.net Mon Nov 4 23:02:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 15:02:01 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 16:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:02 Message: Logged In: YES user_id=33168 Are key & value independant? This code works for me: if (indata->fix) { if (inkeylen > 0 && inkey[inkeylen-1] == '\0') inkeylen--; if (invallen > 0 && inval[invallen-1] == '\0)) invallen--; } Does this work for you or would you rather see the checks together? The reason why I did it this way was in case key or value was a problem, but not both. I don't know if this is a valid concern or not. Let me know how you want me to proceed. If you want to take this over, that's fine too, since the only NIS machine I can test on is the test drive machines. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 08:41 Message: Logged In: YES user_id=21627 I was thinking of if (indata->fix && inkeylen >0 && invaluelen>0 && data[inkeylen-1] == '\0' && value[invaluelen-1] == '\0){ inkeylen--; invaluelen--; } That there is a '':'' entry in mail.aliases might be a bug in the NIS configuration of testdrive. However, the problem with "sometimes it the length includes the null, sometimes not" is probably independent from this specific installation. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 08:10 Message: Logged In: YES user_id=33168 One thing to note: I believe I originally found this problem on the Alphas (Tru64) even before the merger with HP. So this problem could deal more with the NIS configuration. I'm not sure I understand, do you want something like this: if (indata->fix) { if (data[datalen] != '\0') datalen--; } Where data/datalen would be done for both inkey and inval? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 02:15 Message: Logged In: YES user_id=21627 I now have an account on the testdrive machine (as you just referred to them). The -1 is not coming from yp_all; foreach is creating it itself, since fix is true. There appears to be something strange with mail.aliases: On some systems, keys and values have a null byte, on others, they don't. In JNDI, Sun could not solve this in any other way but defining a property com.sun.jndi.nis.mailaliases which indicates whether a null byte should be assumed to be there, see http://www-iiuf.unifr.ch/iiufdev/doc/public/jndi/providers/jndi-nis.html I would then suggest a different strategy: If fix is set, and both the key and the value have a terminating null included in their length, ignore that. Otherwise, copy all bytes into key and value. I'm still uncertain what the '':'' pair is supposed to indicate, but it appears that mail.aliases deliberately includes "invalid" entries. For example, sendmail generates a "@":"@" entry into the aliases file; if this entry is absent, sendmail assumes that the file is truncated. Perhaps the convention on HP-UX is that the invalid entry consists of an empty string pair. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-03 19:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 18:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 17:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 23:06:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 15:06:55 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 22:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 00:06 Message: Logged In: YES user_id=21627 I think this patch is fine. My understanding is that key and value have either both or neither a terminating null, but treating them separately should work just as well. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-05 00:02 Message: Logged In: YES user_id=33168 Are key & value independant? This code works for me: if (indata->fix) { if (inkeylen > 0 && inkey[inkeylen-1] == '\0') inkeylen--; if (invallen > 0 && inval[invallen-1] == '\0)) invallen--; } Does this work for you or would you rather see the checks together? The reason why I did it this way was in case key or value was a problem, but not both. I don't know if this is a valid concern or not. Let me know how you want me to proceed. If you want to take this over, that's fine too, since the only NIS machine I can test on is the test drive machines. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 14:41 Message: Logged In: YES user_id=21627 I was thinking of if (indata->fix && inkeylen >0 && invaluelen>0 && data[inkeylen-1] == '\0' && value[invaluelen-1] == '\0){ inkeylen--; invaluelen--; } That there is a '':'' entry in mail.aliases might be a bug in the NIS configuration of testdrive. However, the problem with "sometimes it the length includes the null, sometimes not" is probably independent from this specific installation. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 14:10 Message: Logged In: YES user_id=33168 One thing to note: I believe I originally found this problem on the Alphas (Tru64) even before the merger with HP. So this problem could deal more with the NIS configuration. I'm not sure I understand, do you want something like this: if (indata->fix) { if (data[datalen] != '\0') datalen--; } Where data/datalen would be done for both inkey and inval? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 08:15 Message: Logged In: YES user_id=21627 I now have an account on the testdrive machine (as you just referred to them). The -1 is not coming from yp_all; foreach is creating it itself, since fix is true. There appears to be something strange with mail.aliases: On some systems, keys and values have a null byte, on others, they don't. In JNDI, Sun could not solve this in any other way but defining a property com.sun.jndi.nis.mailaliases which indicates whether a null byte should be assumed to be there, see http://www-iiuf.unifr.ch/iiufdev/doc/public/jndi/providers/jndi-nis.html I would then suggest a different strategy: If fix is set, and both the key and the value have a terminating null included in their length, ignore that. Otherwise, copy all bytes into key and value. I'm still uncertain what the '':'' pair is supposed to indicate, but it appears that mail.aliases deliberately includes "invalid" entries. For example, sendmail generates a "@":"@" entry into the aliases file; if this entry is absent, sendmail assumes that the file is truncated. Perhaps the convention on HP-UX is that the invalid entry consists of an empty string pair. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 01:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 00:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 23:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 23:27:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 15:27:22 -0800 Subject: [Patches] [ python-Patches-633013 ] Fix NIS causing interpreter core dump Message-ID: Patches item #633013, was opened at 2002-11-03 16:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 Category: Core (C code) Group: Python 2.2.x >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: Fix NIS causing interpreter core dump Initial Comment: When running on the Compaq test drive machines, test_nis will cause the interpreter to core dump. The attached patch prevents the core dump which is caused by passing a negative value to PyString_FromStringAndSize(). I'm not sure if it's 100% correct, but the test passes and the interpreter doesn't core dump. Any one else know if this is correct? I'll apply to prevent the core dump, unless someone complains. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:27 Message: Logged In: YES user_id=33168 Ok, checked in as Modules/nismodule.c 2.24. Will backport when I can get to SF CVS again. :-( ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 18:06 Message: Logged In: YES user_id=21627 I think this patch is fine. My understanding is that key and value have either both or neither a terminating null, but treating them separately should work just as well. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:02 Message: Logged In: YES user_id=33168 Are key & value independant? This code works for me: if (indata->fix) { if (inkeylen > 0 && inkey[inkeylen-1] == '\0') inkeylen--; if (invallen > 0 && inval[invallen-1] == '\0)) invallen--; } Does this work for you or would you rather see the checks together? The reason why I did it this way was in case key or value was a problem, but not both. I don't know if this is a valid concern or not. Let me know how you want me to proceed. If you want to take this over, that's fine too, since the only NIS machine I can test on is the test drive machines. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 08:41 Message: Logged In: YES user_id=21627 I was thinking of if (indata->fix && inkeylen >0 && invaluelen>0 && data[inkeylen-1] == '\0' && value[invaluelen-1] == '\0){ inkeylen--; invaluelen--; } That there is a '':'' entry in mail.aliases might be a bug in the NIS configuration of testdrive. However, the problem with "sometimes it the length includes the null, sometimes not" is probably independent from this specific installation. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 08:10 Message: Logged In: YES user_id=33168 One thing to note: I believe I originally found this problem on the Alphas (Tru64) even before the merger with HP. So this problem could deal more with the NIS configuration. I'm not sure I understand, do you want something like this: if (indata->fix) { if (data[datalen] != '\0') datalen--; } Where data/datalen would be done for both inkey and inval? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 02:15 Message: Logged In: YES user_id=21627 I now have an account on the testdrive machine (as you just referred to them). The -1 is not coming from yp_all; foreach is creating it itself, since fix is true. There appears to be something strange with mail.aliases: On some systems, keys and values have a null byte, on others, they don't. In JNDI, Sun could not solve this in any other way but defining a property com.sun.jndi.nis.mailaliases which indicates whether a null byte should be assumed to be there, see http://www-iiuf.unifr.ch/iiufdev/doc/public/jndi/providers/jndi-nis.html I would then suggest a different strategy: If fix is set, and both the key and the value have a terminating null included in their length, ignore that. Otherwise, copy all bytes into key and value. I'm still uncertain what the '':'' pair is supposed to indicate, but it appears that mail.aliases deliberately includes "invalid" entries. For example, sendmail generates a "@":"@" entry into the aliases file; if this entry is absent, sendmail assumes that the file is truncated. Perhaps the convention on HP-UX is that the invalid entry consists of an empty string pair. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-03 19:54 Message: Logged In: YES user_id=33168 How can I tell if NIS+ is being used? Martin do you have an account on the Compaq testdrive machines? The values are -1 coming in from yp_all as seen from the stack trace: #1 0x8f44c in PyString_FromStringAndSize (str=0x7f7f2e18 "\377\377\377\377", size=-1) at Objects/stringobject.c:85 #2 0xc11ca5bc in nis_foreach (instatus=1, inkey=0x7f7f2e18 "\377\377\377\377", inkeylen=-1, inval=0x7f7f3220 "{\004\006P", invallen=-1, indata=0x7f7f2698) at /tmp/python/Modules/nismodule.c:95 #3 0xc02ff02c in xdr_ypall () from /usr/lib/libnsl.1 #4 0xc02daab4 in xdrrec_skiprecord () from /usr/lib/libnsl.1 #5 0xc02f88c8 in yp_all () from /usr/lib/libnsl.1 #6 0xc11cad68 in nis_cat (self=0x0, args=0x40c80a48) at /tmp/python/Modules/nismodule.c:168 I don't see a specific problem from the man page. Here are some relevant sections: int yp_all( char *indomain, char *inmap, struct ypall_callback *incallback ); struct ypall_callback *incallback { int (*foreach)(); char *data; }; The function foreach() is called as follows: foreach( int instatus; char *inkey; int inkeylen; char *inval; int invallen; char *indata; ); instatus Holds one of the return status values defined in : either YP_TRUE or an error code (see ypprot_err() below, for a function that converts a NIS protocol error code to a ypclnt layer error code, as defined in ). inkey The key and value parameters are inval somewhat different than defined in the SYNOPSIS section above. First, the memory pointed to by inkey and inval is private to yp_all(), and is overwritten with the arrival of each new key-value pair. Therefore, foreach() should do something useful with the contents of that memory, but it does not own the memory. Key and value objects presented to the foreach() look exactly as they do in the server's map. Therefore, if they were not newline-terminated or null- terminated in the map, they will not be terminated with newline or null characters here, either. indata Is the contents of the incallback->data element passed to yp_all() The data element of the callback structure can share state information between foreach() and the mainline code. Its use is optional, and no part of the NIS client package inspects its contents. Cast it to something useful or ignore it as appropriate. The foreach() function is Boolean. It should return zero to indicate it needs to be called again for further received key-value pairs, or non-zero to stop the flow of key-value pairs. If foreach() returns a non-zero value, it is not called again and the functional value of yp_all() is then 0. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 18:11 Message: Logged In: YES user_id=21627 A quick test shows that indeed the if(fix) block causes the trouble; it crashes with mail.aliases, because both strings are empty. I'm not entirely sure what the fix mechanism is supposed to achieve; it does appear that it indeed avoids copying an extra null byte on Solaris. The comment about "makedbm -a" sounds mystical: makedbm has no documented -a option. We should probably ask Fred Gansevles, who added this in 2.15. There is also a GvR comment who says it doesn't work for NIS+. Unless a better strategy shows up, I suggest to skip entries which have both empty keys and values. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-03 17:32 Message: Logged In: YES user_id=21627 The patch looks wrong. What is the value of inkeylen and invallen at the point of the crash? Might it be -1, due to the prior decrement? Was that for a 32-bit or a 64-bit binary? Could it be that Python is using an incorrect signature of the foreach function (despite the man page saying that this is the correct signature)? Could it be that the data are really large unsigned numbers? If so, what are the corresponding data? The foreach function is supposedly called once per record, so both sizes ought to be small. I am concerned about thread-safety of this entire module, though. yp_all is invoked with the GIL released, yet the callback function calls interpreter API. This asks for a desaster if other threads simultanously access the interpreter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633013&group_id=5470 From noreply@sourceforge.net Mon Nov 4 23:32:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 15:32:27 -0800 Subject: [Patches] [ python-Patches-633425 ] bz2 compression module Message-ID: Patches item #633425, was opened at 2002-11-04 20:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633425&group_id=5470 Category: Modules Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 3 Submitted By: Gustavo Niemeyer (niemeyer) >Assigned to: Gustavo Niemeyer (niemeyer) Summary: bz2 compression module Initial Comment: As discussed in python-dev, here is the patch implementing the bz2 module, including comrehensive documentation and tests. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 00:32 Message: Logged In: YES user_id=21627 The patch looks fine overall, so please apply it. Don't forget to add an entry to Misc/NEWS. Providing an appropriate entry in Modules/Setup.dist might also be a good idea. A couple of spelling corrections: - Use imperative for doc strings, not indicative. I.e. write "Return the next line", not "Return next line". - Review the text for missing articles. In general, an article is needed before every noun in English. A native speaker should review this after checkin - I don't think we need to find all grammar errors on SF. - "one shot decompression" -> "one-shot decompression" ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633425&group_id=5470 From noreply@sourceforge.net Mon Nov 4 23:43:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 15:43:05 -0800 Subject: [Patches] [ python-Patches-633547 ] Plural forms support for gettext Message-ID: Patches item #633547, was opened at 2002-11-05 00:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Juan David Ibáñez Palomar (jdavid) Assigned to: Nobody/Anonymous (nobody) Summary: Plural forms support for gettext Initial Comment: Adds support for plural forms to the gettext module. The test script has been rewritten to use unittest. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 From noreply@sourceforge.net Mon Nov 4 23:54:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 15:54:25 -0800 Subject: [Patches] [ python-Patches-633374 ] nondestructive dict.popitem and Set.pop Message-ID: Patches item #633374, was opened at 2002-11-04 12:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Williams (johnw42) Assigned to: Nobody/Anonymous (nobody) >Summary: nondestructive dict.popitem and Set.pop Initial Comment: This patch (relative to the latest Python CVS tree) adds a "pickitem" method to the builtin dict class and a "pick" method to the BaseSet class. These methods are analogs of "dict.popitem" and "Set.pop", but they don't remove the item they return from the dict/set. This patch *does not* update the documentation. This is my system: Linux 2.4.2-2 #1 i686 unknown ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:54 Message: Logged In: YES user_id=33168 Fixed the Summary, lest someone think you were making a personal problem public. :-) Seriously though...I haven't looked at the patch, but could you explain the rationale/benefit? Is this likely to be useful to many people or is it fairly limited? Couldn't you do dict.items()[0] if you wanted a random value? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 From noreply@sourceforge.net Tue Nov 5 00:57:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 16:57:15 -0800 Subject: [Patches] [ python-Patches-618791 ] [mingw patches] alloca and posixmodule Message-ID: Patches item #618791, was opened at 2002-10-05 01:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 2 Submitted By: Gerhard Häring (ghaering) Assigned to: Guido van Rossum (gvanrossum) Summary: [mingw patches] alloca and posixmodule Initial Comment: This is the first patch in a series of patching of porting Python to native win32, while still using the autoconf-based build process. The compiler used is mingw, the build environment used is msys, a stripped down Cygwin from the mingw project. This patch does several things: * change _alloca to alloca for both mingw and Visual C++, to avoid unnecessary #ifdef-ing. * Change the makesetup shell script to work for win32, where for some weird reason we have a module 'nt' built from a posixmodule.c file. * Change on occurence of #ifdef MS_WINDOWS in posixmodule.c where it should really have been #ifdef Py_WIN_WIDE_FILENAMES * Change the #ifdefs in posixmodule.c so that it can be built with both MSVC and mingw The result of this patch is that we can build a statically built python.exe with a simple ./configure make under mingw/msys. There's, however, still of additional work to do until we can build a native win32 Python with the autoconf-based build process. Please apply this ASAP, as I want to avoid having a diverging Python tree on my harddisk (this makes patch creation a lot more difficult). ---------------------------------------------------------------------- >Comment By: Gerhard Häring (ghaering) Date: 2002-11-05 01:57 Message: Logged In: YES user_id=163326 Sounds great. I'd have needed advice from python-dev anyway, especially on the parts where autoconf is concerned (I'm relatively green there). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-04 21:04 Message: Logged In: YES user_id=6380 What I'd like to see most is for somebody with CVS commit permission for Python *and* an understanding of mingw to start making the changes in Python's CVS. I'd be willing to give you CVS permission for this, if you're willing to work with python-dev regarding the acceptability of the various changes you're proposing. I presume you'll quickly get a sense for what kind of changes are non-controversional and can be checked in without asking. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-11-04 15:14 Message: Logged In: YES user_id=163326 Guido, if you think that there should either be one big patch that enables Python to be built with mingw or nothing at all, then please close this as 'rejected' or whatever. There are good reasons for doing so, just as there are arguments for incremental patches, like I described above. I won't feel offended, especially as I know how annoying it is for myself to have a SF entry page full of this kind of patches/bugs :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 From noreply@sourceforge.net Tue Nov 5 03:39:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 19:39:03 -0800 Subject: [Patches] [ python-Patches-618791 ] [mingw patches] alloca and posixmodule Message-ID: Patches item #618791, was opened at 2002-10-04 19:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 2 Submitted By: Gerhard Häring (ghaering) Assigned to: Guido van Rossum (gvanrossum) Summary: [mingw patches] alloca and posixmodule Initial Comment: This is the first patch in a series of patching of porting Python to native win32, while still using the autoconf-based build process. The compiler used is mingw, the build environment used is msys, a stripped down Cygwin from the mingw project. This patch does several things: * change _alloca to alloca for both mingw and Visual C++, to avoid unnecessary #ifdef-ing. * Change the makesetup shell script to work for win32, where for some weird reason we have a module 'nt' built from a posixmodule.c file. * Change on occurence of #ifdef MS_WINDOWS in posixmodule.c where it should really have been #ifdef Py_WIN_WIDE_FILENAMES * Change the #ifdefs in posixmodule.c so that it can be built with both MSVC and mingw The result of this patch is that we can build a statically built python.exe with a simple ./configure make under mingw/msys. There's, however, still of additional work to do until we can build a native win32 Python with the autoconf-based build process. Please apply this ASAP, as I want to avoid having a diverging Python tree on my harddisk (this makes patch creation a lot more difficult). ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-04 22:39 Message: Logged In: YES user_id=31435 Welcome, Gerhard! You have commit privileges now. If you need any help with SourceForge mechanics, ask on Python-Dev and you'll get more advice than you can stand . ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-11-04 19:57 Message: Logged In: YES user_id=163326 Sounds great. I'd have needed advice from python-dev anyway, especially on the parts where autoconf is concerned (I'm relatively green there). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-04 15:04 Message: Logged In: YES user_id=6380 What I'd like to see most is for somebody with CVS commit permission for Python *and* an understanding of mingw to start making the changes in Python's CVS. I'd be willing to give you CVS permission for this, if you're willing to work with python-dev regarding the acceptability of the various changes you're proposing. I presume you'll quickly get a sense for what kind of changes are non-controversional and can be checked in without asking. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-11-04 09:14 Message: Logged In: YES user_id=163326 Guido, if you think that there should either be one big patch that enables Python to be built with mingw or nothing at all, then please close this as 'rejected' or whatever. There are good reasons for doing so, just as there are arguments for incremental patches, like I described above. I won't feel offended, especially as I know how annoying it is for myself to have a SF entry page full of this kind of patches/bugs :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 From noreply@sourceforge.net Tue Nov 5 04:59:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 20:59:16 -0800 Subject: [Patches] [ python-Patches-633633 ] Cleanup of test_strptime.py Message-ID: Patches item #633633, was opened at 2002-11-04 20:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Cleanup of test_strptime.py Initial Comment: I finally got around to cleaning up test_strptime.py . Basically all I did was break all the lines that went over 80 characters (although there a few that go over by a char or two). I also removed the __version__ variable. Who ever applies this patch wishes to you can go ahead and also remove the __version__ variable for _strptime.py ; it's a relic and not needed let alone updated since I never remember to. And yes, the testing suite still runs and passes all the tests. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 From noreply@sourceforge.net Tue Nov 5 05:07:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 21:07:14 -0800 Subject: [Patches] [ python-Patches-633635 ] Too much chtype in _cursesmodule.c Message-ID: Patches item #633635, was opened at 2002-11-05 00:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: David M. Cooke (dmcooke) Assigned to: Nobody/Anonymous (nobody) Summary: Too much chtype in _cursesmodule.c Initial Comment: The C prototype for getch is 'int getch(void)', not 'chtype getch(void)', as assumed by _cursesmodule.c (the same for ungetch and keyname). [I've checked this under Linux, SunOS, and Tru64] keyname() seems to segfault if passed -1, so I've tested for that. In addition, according to the docs, the .getch() and .getkey() methods of a window object should throw an exception when there isn't any input in nodelay mode (the C functions return ERR (== -1) in this case). I've fixed getkey, but not getch, in this patch: since getch returns an int anyways, it seems better to return -1 on no input. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 From noreply@sourceforge.net Tue Nov 5 06:31:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 22:31:33 -0800 Subject: [Patches] [ python-Patches-633374 ] nondestructive dict.popitem and Set.pop Message-ID: Patches item #633374, was opened at 2002-11-04 12:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Williams (johnw42) Assigned to: Nobody/Anonymous (nobody) Summary: nondestructive dict.popitem and Set.pop Initial Comment: This patch (relative to the latest Python CVS tree) adds a "pickitem" method to the builtin dict class and a "pick" method to the BaseSet class. These methods are analogs of "dict.popitem" and "Set.pop", but they don't remove the item they return from the dict/set. This patch *does not* update the documentation. This is my system: Linux 2.4.2-2 #1 i686 unknown ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:31 Message: Logged In: YES user_id=80475 dict.popitem() was added because the it could retrieve and delete a key/value pair without hashing -- there were no existing methods which could achieve the same result. In contract, the dict.pickitem() patch doesn't appear to offer a differential advantage over dict.iteritems().next() for retrieving an arbitrary (hash order) key/value pair. Also, since successive calls to pickitem() retrieve the same pair, it doesn't appear to be useful in a loop or warrant a C speed optimization ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:54 Message: Logged In: YES user_id=33168 Fixed the Summary, lest someone think you were making a personal problem public. :-) Seriously though...I haven't looked at the patch, but could you explain the rationale/benefit? Is this likely to be useful to many people or is it fairly limited? Couldn't you do dict.items()[0] if you wanted a random value? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 From noreply@sourceforge.net Tue Nov 5 06:33:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Nov 2002 22:33:10 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Martin v. Löwis (loewis) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 5 08:27:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 00:27:58 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-28 03:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Martin v. Löwis (loewis) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 09:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 07:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 08:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-29 05:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-29 05:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 13:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 5 09:03:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 01:03:06 -0800 Subject: [Patches] [ python-Patches-632643 ] Punycode encoding Message-ID: Patches item #632643, was opened at 2002-11-02 18:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632643&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) >Assigned to: Martin v. Löwis (loewis) Summary: Punycode encoding Initial Comment: This patch implements Punycode, http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt This will be used by the internationalized domain names. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-11-05 10:03 Message: Logged In: YES user_id=38388 I don't have time to review this. If punycode will acutally become a standard I'm all for adding support to Python for this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=632643&group_id=5470 From noreply@sourceforge.net Tue Nov 5 09:36:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 01:36:32 -0800 Subject: [Patches] [ python-Patches-633547 ] Plural forms support for gettext Message-ID: Patches item #633547, was opened at 2002-11-05 00:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Juan David Ibáñez Palomar (jdavid) Assigned to: Nobody/Anonymous (nobody) Summary: Plural forms support for gettext Initial Comment: Adds support for plural forms to the gettext module. The test script has been rewritten to use unittest. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=21627 The patch looks quite good, overall. However, I don't like the use of eval to generate the plural form function: it is, in general, a security issue to evaluate a string that you read from some file. I would prefer if it parses the string, or uses other mechanisms to establish "safety": for example, if the only identifier occurring in the string is 'n', then this would be a good test. You might want to use tokenize.generate_tokens for that. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 From noreply@sourceforge.net Tue Nov 5 10:35:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 02:35:56 -0800 Subject: [Patches] [ python-Patches-618791 ] [mingw patches] alloca and posixmodule Message-ID: Patches item #618791, was opened at 2002-10-05 01:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 2 Submitted By: Gerhard Häring (ghaering) >Assigned to: Gerhard Häring (ghaering) Summary: [mingw patches] alloca and posixmodule Initial Comment: This is the first patch in a series of patching of porting Python to native win32, while still using the autoconf-based build process. The compiler used is mingw, the build environment used is msys, a stripped down Cygwin from the mingw project. This patch does several things: * change _alloca to alloca for both mingw and Visual C++, to avoid unnecessary #ifdef-ing. * Change the makesetup shell script to work for win32, where for some weird reason we have a module 'nt' built from a posixmodule.c file. * Change on occurence of #ifdef MS_WINDOWS in posixmodule.c where it should really have been #ifdef Py_WIN_WIDE_FILENAMES * Change the #ifdefs in posixmodule.c so that it can be built with both MSVC and mingw The result of this patch is that we can build a statically built python.exe with a simple ./configure make under mingw/msys. There's, however, still of additional work to do until we can build a native win32 Python with the autoconf-based build process. Please apply this ASAP, as I want to avoid having a diverging Python tree on my harddisk (this makes patch creation a lot more difficult). ---------------------------------------------------------------------- >Comment By: Gerhard Häring (ghaering) Date: 2002-11-05 11:35 Message: Logged In: YES user_id=163326 Cool :-) I'm assigning this patch to myself now. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 04:39 Message: Logged In: YES user_id=31435 Welcome, Gerhard! You have commit privileges now. If you need any help with SourceForge mechanics, ask on Python-Dev and you'll get more advice than you can stand . ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-11-05 01:57 Message: Logged In: YES user_id=163326 Sounds great. I'd have needed advice from python-dev anyway, especially on the parts where autoconf is concerned (I'm relatively green there). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-04 21:04 Message: Logged In: YES user_id=6380 What I'd like to see most is for somebody with CVS commit permission for Python *and* an understanding of mingw to start making the changes in Python's CVS. I'd be willing to give you CVS permission for this, if you're willing to work with python-dev regarding the acceptability of the various changes you're proposing. I presume you'll quickly get a sense for what kind of changes are non-controversional and can be checked in without asking. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-11-04 15:14 Message: Logged In: YES user_id=163326 Guido, if you think that there should either be one big patch that enables Python to be built with mingw or nothing at all, then please close this as 'rejected' or whatever. There are good reasons for doing so, just as there are arguments for incremental patches, like I described above. I won't feel offended, especially as I know how annoying it is for myself to have a SF entry page full of this kind of patches/bugs :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618791&group_id=5470 From noreply@sourceforge.net Tue Nov 5 10:36:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 02:36:13 -0800 Subject: [Patches] [ python-Patches-618135 ] gzip.py and files > 2G Message-ID: Patches item #618135, was opened at 2002-10-03 18:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Fixed Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Tim Peters (tim_one) Summary: gzip.py and files > 2G Initial Comment: Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen ---------------------------------------------------------------------- >Comment By: Geert Jansen (geertj) Date: 2002-11-05 11:36 Message: Logged In: YES user_id=537938 I'm afraid this doesn't fix the whole problem. You fixed the problem for file sizes in the range 2G-4G, but (if I read your patch correctly), files >4G still don't work. On Linux it is very easy to create files > 4G and Python supports this, so it would be nice to have. A better fix IMHO would be to test the file size modulo 4G. The probability that an invalid gzip files becomes valid by this less accurate test is astronomically small (there is also a CRC). In fact, this is also the fix that the "official" gzip program uses. I can give you a test account on my Linux machine if you want to test a patch and don't have a machine with large file support nearby . Or I can test a patch for you. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-04 20:51 Message: Logged In: YES user_id=31435 Fixed, by related changes in Lib/gzip.py; new revision: 1.36 Misc/NEWS; new revision: 1.508 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-04 18:08 Message: Logged In: YES user_id=31435 Assigned to me. I think your suggested fix makes good sense. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2002-10-04 09:36 Message: Logged In: YES user_id=537938 Sorry -- it seems the file upload went wrong! Second try. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 From noreply@sourceforge.net Tue Nov 5 13:16:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 05:16:26 -0800 Subject: [Patches] [ python-Patches-633635 ] Too much chtype in _cursesmodule.c Message-ID: Patches item #633635, was opened at 2002-11-05 00:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: David M. Cooke (dmcooke) >Assigned to: A.M. Kuchling (akuchling) Summary: Too much chtype in _cursesmodule.c Initial Comment: The C prototype for getch is 'int getch(void)', not 'chtype getch(void)', as assumed by _cursesmodule.c (the same for ungetch and keyname). [I've checked this under Linux, SunOS, and Tru64] keyname() seems to segfault if passed -1, so I've tested for that. In addition, according to the docs, the .getch() and .getkey() methods of a window object should throw an exception when there isn't any input in nodelay mode (the C functions return ERR (== -1) in this case). I've fixed getkey, but not getch, in this patch: since getch returns an int anyways, it seems better to return -1 on no input. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 From noreply@sourceforge.net Tue Nov 5 13:25:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 05:25:40 -0800 Subject: [Patches] [ python-Patches-633635 ] Too much chtype in _cursesmodule.c Message-ID: Patches item #633635, was opened at 2002-11-05 00:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 Category: Modules Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: David M. Cooke (dmcooke) Assigned to: A.M. Kuchling (akuchling) Summary: Too much chtype in _cursesmodule.c Initial Comment: The C prototype for getch is 'int getch(void)', not 'chtype getch(void)', as assumed by _cursesmodule.c (the same for ungetch and keyname). [I've checked this under Linux, SunOS, and Tru64] keyname() seems to segfault if passed -1, so I've tested for that. In addition, according to the docs, the .getch() and .getkey() methods of a window object should throw an exception when there isn't any input in nodelay mode (the C functions return ERR (== -1) in this case). I've fixed getkey, but not getch, in this patch: since getch returns an int anyways, it seems better to return -1 on no input. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-05 08:25 Message: Logged In: YES user_id=11375 The patch looks OK. One thing: the patch makes keyname() return an empty string if the character is -1. My gut feeling is that an exception is better, because -1 really isn't a legal key code. What do you think? Once we've resolved that issue, I'll check in the patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 From noreply@sourceforge.net Tue Nov 5 15:20:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 07:20:14 -0800 Subject: [Patches] [ python-Patches-633374 ] nondestructive dict.popitem and Set.pop Message-ID: Patches item #633374, was opened at 2002-11-04 11:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Williams (johnw42) Assigned to: Nobody/Anonymous (nobody) Summary: nondestructive dict.popitem and Set.pop Initial Comment: This patch (relative to the latest Python CVS tree) adds a "pickitem" method to the builtin dict class and a "pick" method to the BaseSet class. These methods are analogs of "dict.popitem" and "Set.pop", but they don't remove the item they return from the dict/set. This patch *does not* update the documentation. This is my system: Linux 2.4.2-2 #1 i686 unknown ---------------------------------------------------------------------- >Comment By: John Williams (johnw42) Date: 2002-11-05 09:20 Message: Logged In: YES user_id=44174 There's no technical reason why this patch is necessary, but I think having it would make it easier to write clean and readable code. It seems very unintuitive to me that the nondescructive analog of "popitem()" would be "iteritems().next()". Even though I'm pretty familiar with iterators, this solution did not occur to me immediately, and I suspect a large portion of Python's users don't even know about using iterators this way. The "items()[0]" solution is even worse, IMHO, since it involves generating a whole list just to get a single item. I was also trying to preserve the similarity between dicts and sets, and both the list solution and the iterator solution look pretty different when used on sets. Also, using an iterator fails with a StopIteration exception when the dict/set is empty, but the methods in the patch raise KeyError with a helpful error string explaining the problem, just like pop and popitem. I wouldn't venture to guess how often others would use these methods; I just know I would have found them helpful recently. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 00:31 Message: Logged In: YES user_id=80475 dict.popitem() was added because the it could retrieve and delete a key/value pair without hashing -- there were no existing methods which could achieve the same result. In contract, the dict.pickitem() patch doesn't appear to offer a differential advantage over dict.iteritems().next() for retrieving an arbitrary (hash order) key/value pair. Also, since successive calls to pickitem() retrieve the same pair, it doesn't appear to be useful in a loop or warrant a C speed optimization ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 17:54 Message: Logged In: YES user_id=33168 Fixed the Summary, lest someone think you were making a personal problem public. :-) Seriously though...I haven't looked at the patch, but could you explain the rationale/benefit? Is this likely to be useful to many people or is it fairly limited? Couldn't you do dict.items()[0] if you wanted a random value? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 From noreply@sourceforge.net Tue Nov 5 15:36:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 07:36:12 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Martin v. Löwis (loewis) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 5 15:40:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 07:40:54 -0800 Subject: [Patches] [ python-Patches-633635 ] Too much chtype in _cursesmodule.c Message-ID: Patches item #633635, was opened at 2002-11-05 00:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 Category: Modules Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: David M. Cooke (dmcooke) Assigned to: A.M. Kuchling (akuchling) Summary: Too much chtype in _cursesmodule.c Initial Comment: The C prototype for getch is 'int getch(void)', not 'chtype getch(void)', as assumed by _cursesmodule.c (the same for ungetch and keyname). [I've checked this under Linux, SunOS, and Tru64] keyname() seems to segfault if passed -1, so I've tested for that. In addition, according to the docs, the .getch() and .getkey() methods of a window object should throw an exception when there isn't any input in nodelay mode (the C functions return ERR (== -1) in this case). I've fixed getkey, but not getch, in this patch: since getch returns an int anyways, it seems better to return -1 on no input. ---------------------------------------------------------------------- >Comment By: David M. Cooke (dmcooke) Date: 2002-11-05 10:40 Message: Logged In: YES user_id=65069 Sounds good. I've updated the patch. keyname(c) throws a ValueError on c < 0, and I've fixed the documentation to mention getch() returns -1 in nodelay mode when there is no input. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-11-05 08:25 Message: Logged In: YES user_id=11375 The patch looks OK. One thing: the patch makes keyname() return an empty string if the character is -1. My gut feeling is that an exception is better, because -1 really isn't a legal key code. What do you think? Once we've resolved that issue, I'll check in the patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 From noreply@sourceforge.net Tue Nov 5 15:59:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 07:59:46 -0800 Subject: [Patches] [ python-Patches-633870 ] allow any seq assignment to a list slice Message-ID: Patches item #633870, was opened at 2002-11-05 16:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: allow any seq assignment to a list slice Initial Comment: as suggested by Michael Hudson in his comp.lang.python post of Tue, 5 Nov 2002 14:03:46 GMT, Subject "Re: List slice assignment and custom sequences", message id . The patch affects Objects/listobject.c: with no performance impact when the RHS of an assignment to a list slice is a list, the patch also allows the RHS to be any other sequence object acceptable to PySequence_Fast -- just like such general sequences are acceptable today e.g. as arguments to the extend method, so the patch makes them acceptable as RHS in assignment to list slices. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 From noreply@sourceforge.net Tue Nov 5 16:51:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 08:51:46 -0800 Subject: [Patches] [ python-Patches-633870 ] allow any seq assignment to a list slice Message-ID: Patches item #633870, was opened at 2002-11-05 15:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) >Assigned to: Guido van Rossum (gvanrossum) Summary: allow any seq assignment to a list slice Initial Comment: as suggested by Michael Hudson in his comp.lang.python post of Tue, 5 Nov 2002 14:03:46 GMT, Subject "Re: List slice assignment and custom sequences", message id . The patch affects Objects/listobject.c: with no performance impact when the RHS of an assignment to a list slice is a list, the patch also allows the RHS to be any other sequence object acceptable to PySequence_Fast -- just like such general sequences are acceptable today e.g. as arguments to the extend method, so the patch makes them acceptable as RHS in assignment to list slices. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-05 16:51 Message: Logged In: YES user_id=6656 I wonder if you'd get much performance hit from just using PySequence_FAST the whole time? All that does for a list after all is Py_INCREF it (I hope you still have whatever test harness you used to make your claims in the description). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 From noreply@sourceforge.net Tue Nov 5 17:09:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 09:09:47 -0800 Subject: [Patches] [ python-Patches-633870 ] allow any seq assignment to a list slice Message-ID: Patches item #633870, was opened at 2002-11-05 16:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Guido van Rossum (gvanrossum) Summary: allow any seq assignment to a list slice Initial Comment: as suggested by Michael Hudson in his comp.lang.python post of Tue, 5 Nov 2002 14:03:46 GMT, Subject "Re: List slice assignment and custom sequences", message id . The patch affects Objects/listobject.c: with no performance impact when the RHS of an assignment to a list slice is a list, the patch also allows the RHS to be any other sequence object acceptable to PySequence_Fast -- just like such general sequences are acceptable today e.g. as arguments to the extend method, so the patch makes them acceptable as RHS in assignment to list slices. ---------------------------------------------------------------------- >Comment By: Alex Martelli (aleax) Date: 2002-11-05 18:09 Message: Logged In: YES user_id=60314 I cannot reliably measure any performance difference between using PySequence_Fast unconditionally, and specialcasing a list RHS. I've attached the patch for the unconditional version, which is five lines less than the earlier specialcased version. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-05 17:51 Message: Logged In: YES user_id=6656 I wonder if you'd get much performance hit from just using PySequence_FAST the whole time? All that does for a list after all is Py_INCREF it (I hope you still have whatever test harness you used to make your claims in the description). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 From noreply@sourceforge.net Tue Nov 5 17:09:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 09:09:56 -0800 Subject: [Patches] [ python-Patches-633425 ] bz2 compression module Message-ID: Patches item #633425, was opened at 2002-11-04 19:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633425&group_id=5470 Category: Modules Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 3 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Gustavo Niemeyer (niemeyer) Summary: bz2 compression module Initial Comment: As discussed in python-dev, here is the patch implementing the bz2 module, including comrehensive documentation and tests. ---------------------------------------------------------------------- >Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-05 17:09 Message: Logged In: YES user_id=7887 Applied as: setup.py:1.112->1.113 Doc/Makefile.deps:1.89->1.90 Doc/lib/lib.tex:1.204->1.205 Doc/lib/libbz2.tex:INITIAL->1.1 Lib/test/test_bz2.py:INITIAL->1.1 Misc/NEWS:1.508->1.509 Modules/bz2module.c:INITIAL->1.1 Thank you! ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 23:32 Message: Logged In: YES user_id=21627 The patch looks fine overall, so please apply it. Don't forget to add an entry to Misc/NEWS. Providing an appropriate entry in Modules/Setup.dist might also be a good idea. A couple of spelling corrections: - Use imperative for doc strings, not indicative. I.e. write "Return the next line", not "Return next line". - Review the text for missing articles. In general, an article is needed before every noun in English. A native speaker should review this after checkin - I don't think we need to find all grammar errors on SF. - "one shot decompression" -> "one-shot decompression" ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633425&group_id=5470 From noreply@sourceforge.net Tue Nov 5 17:12:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 09:12:23 -0800 Subject: [Patches] [ python-Patches-633870 ] allow any seq assignment to a list slice Message-ID: Patches item #633870, was opened at 2002-11-05 10:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) >Assigned to: Michael Hudson (mwh) Summary: allow any seq assignment to a list slice Initial Comment: as suggested by Michael Hudson in his comp.lang.python post of Tue, 5 Nov 2002 14:03:46 GMT, Subject "Re: List slice assignment and custom sequences", message id . The patch affects Objects/listobject.c: with no performance impact when the RHS of an assignment to a list slice is a list, the patch also allows the RHS to be any other sequence object acceptable to PySequence_Fast -- just like such general sequences are acceptable today e.g. as arguments to the extend method, so the patch makes them acceptable as RHS in assignment to list slices. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-05 12:12 Message: Logged In: YES user_id=6380 The idea sounds like a fine one to me. (I think that's a reversal of opinion. So be it. :-) I don't want to be responsible for reviewing the code however. Assigned to MWH (since there's no way to simply unassign). ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2002-11-05 12:09 Message: Logged In: YES user_id=60314 I cannot reliably measure any performance difference between using PySequence_Fast unconditionally, and specialcasing a list RHS. I've attached the patch for the unconditional version, which is five lines less than the earlier specialcased version. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-05 11:51 Message: Logged In: YES user_id=6656 I wonder if you'd get much performance hit from just using PySequence_FAST the whole time? All that does for a list after all is Py_INCREF it (I hope you still have whatever test harness you used to make your claims in the description). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 From noreply@sourceforge.net Tue Nov 5 17:34:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 09:34:35 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-28 03:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Martin v. Löwis (loewis) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 18:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 09:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 07:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 08:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-29 05:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-29 05:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 13:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 5 17:41:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 09:41:25 -0800 Subject: [Patches] [ python-Patches-633870 ] allow any seq assignment to a list slice Message-ID: Patches item #633870, was opened at 2002-11-05 15:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Michael Hudson (mwh) Summary: allow any seq assignment to a list slice Initial Comment: as suggested by Michael Hudson in his comp.lang.python post of Tue, 5 Nov 2002 14:03:46 GMT, Subject "Re: List slice assignment and custom sequences", message id . The patch affects Objects/listobject.c: with no performance impact when the RHS of an assignment to a list slice is a list, the patch also allows the RHS to be any other sequence object acceptable to PySequence_Fast -- just like such general sequences are acceptable today e.g. as arguments to the extend method, so the patch makes them acceptable as RHS in assignment to list slices. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-05 17:41 Message: Logged In: YES user_id=6656 Looks OK to me. Will check in when make test finishes. ... It's in, as Lib/test/test_types.py revision 1.38 Objects/listobject.c revision 2.138 feel free to write a better test case! I'm a little concerned about docs, but can't find anything that clearly defines the old behaviour (there's a little bit in the lang ref, but that's talking about slice assignments in general, not to lists). PS: Alex, when submitted followup patches, please make it obvious in the comment or the file name which is newest. I've suffered sf enough to remember that the top one is the newest, but it's not that clear. Yes, this is sf's fault, but... ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-05 17:12 Message: Logged In: YES user_id=6380 The idea sounds like a fine one to me. (I think that's a reversal of opinion. So be it. :-) I don't want to be responsible for reviewing the code however. Assigned to MWH (since there's no way to simply unassign). ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2002-11-05 17:09 Message: Logged In: YES user_id=60314 I cannot reliably measure any performance difference between using PySequence_Fast unconditionally, and specialcasing a list RHS. I've attached the patch for the unconditional version, which is five lines less than the earlier specialcased version. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-05 16:51 Message: Logged In: YES user_id=6656 I wonder if you'd get much performance hit from just using PySequence_FAST the whole time? All that does for a list after all is Py_INCREF it (I hope you still have whatever test harness you used to make your claims in the description). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 From noreply@sourceforge.net Tue Nov 5 17:41:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 09:41:53 -0800 Subject: [Patches] [ python-Patches-633870 ] allow any seq assignment to a list slice Message-ID: Patches item #633870, was opened at 2002-11-05 15:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Michael Hudson (mwh) Summary: allow any seq assignment to a list slice Initial Comment: as suggested by Michael Hudson in his comp.lang.python post of Tue, 5 Nov 2002 14:03:46 GMT, Subject "Re: List slice assignment and custom sequences", message id . The patch affects Objects/listobject.c: with no performance impact when the RHS of an assignment to a list slice is a list, the patch also allows the RHS to be any other sequence object acceptable to PySequence_Fast -- just like such general sequences are acceptable today e.g. as arguments to the extend method, so the patch makes them acceptable as RHS in assignment to list slices. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-05 17:41 Message: Logged In: YES user_id=6656 Looks OK to me. Will check in when make test finishes. ... It's in, as Lib/test/test_types.py revision 1.38 Objects/listobject.c revision 2.138 feel free to write a better test case! I'm a little concerned about docs, but can't find anything that clearly defines the old behaviour (there's a little bit in the lang ref, but that's talking about slice assignments in general, not to lists). PS: Alex, when submitted followup patches, please make it obvious in the comment or the file name which is newest. I've suffered sf enough to remember that the top one is the newest, but it's not that clear. Yes, this is sf's fault, but... ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-05 17:12 Message: Logged In: YES user_id=6380 The idea sounds like a fine one to me. (I think that's a reversal of opinion. So be it. :-) I don't want to be responsible for reviewing the code however. Assigned to MWH (since there's no way to simply unassign). ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2002-11-05 17:09 Message: Logged In: YES user_id=60314 I cannot reliably measure any performance difference between using PySequence_Fast unconditionally, and specialcasing a list RHS. I've attached the patch for the unconditional version, which is five lines less than the earlier specialcased version. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-05 16:51 Message: Logged In: YES user_id=6656 I wonder if you'd get much performance hit from just using PySequence_FAST the whole time? All that does for a list after all is Py_INCREF it (I hope you still have whatever test harness you used to make your claims in the description). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633870&group_id=5470 From noreply@sourceforge.net Tue Nov 5 18:31:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 10:31:44 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Martin v. Löwis (loewis) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 5 19:54:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 11:54:44 -0800 Subject: [Patches] [ python-Patches-613434 ] rm email package dependency on rfc822.py Message-ID: Patches item #613434, was opened at 2002-09-23 17:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=613434&group_id=5470 Category: Library (Lib) >Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Jason R. Mastaler (jasonrm) Assigned to: Barry A. Warsaw (bwarsaw) Summary: rm email package dependency on rfc822.py Initial Comment: This allows the latest email package (2.3.1) to also work on Python 2.1 and 2.1.1 instead of only 2.1.2 and beyond. For the details behind this patch, see http://article.gmane.org/gmane.comp.python.mime.devel/102 and related followups. ---------------------------------------------------------------------- Comment By: Jason R. Mastaler (jasonrm) Date: 2002-09-23 17:27 Message: Logged In: YES user_id=85984 Also attached the necessary package-private module _parseaddr.py. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=613434&group_id=5470 From noreply@sourceforge.net Tue Nov 5 20:40:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 12:40:41 -0800 Subject: [Patches] [ python-Patches-618135 ] gzip.py and files > 2G Message-ID: Patches item #618135, was opened at 2002-10-03 12:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Fixed Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Tim Peters (tim_one) Summary: gzip.py and files > 2G Initial Comment: Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-05 15:40 Message: Logged In: YES user_id=31435 Got it. It's distasteful but pragmatic . Fixed again, in Lib/gzip.py; new revision: 1.37 Misc/NEWS; new revision: 1.510 It was tested "by hand" on Win2K (on a 6+GB file). ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2002-11-05 05:36 Message: Logged In: YES user_id=537938 I'm afraid this doesn't fix the whole problem. You fixed the problem for file sizes in the range 2G-4G, but (if I read your patch correctly), files >4G still don't work. On Linux it is very easy to create files > 4G and Python supports this, so it would be nice to have. A better fix IMHO would be to test the file size modulo 4G. The probability that an invalid gzip files becomes valid by this less accurate test is astronomically small (there is also a CRC). In fact, this is also the fix that the "official" gzip program uses. I can give you a test account on my Linux machine if you want to test a patch and don't have a machine with large file support nearby . Or I can test a patch for you. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-04 14:51 Message: Logged In: YES user_id=31435 Fixed, by related changes in Lib/gzip.py; new revision: 1.36 Misc/NEWS; new revision: 1.508 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-04 12:08 Message: Logged In: YES user_id=31435 Assigned to me. I think your suggested fix makes good sense. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2002-10-04 03:36 Message: Logged In: YES user_id=537938 Sorry -- it seems the file upload went wrong! Second try. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470 From noreply@sourceforge.net Tue Nov 5 20:56:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 12:56:07 -0800 Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection Message-ID: Patches item #515003, was opened at 2002-02-08 16:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None >Priority: 3 Submitted By: Mihai Ibanescu (misa) Assigned to: Jeremy Hylton (jhylton) Summary: Added HTTP{,S}ProxyConnection Initial Comment: This patch adds HTTP*Connection classes for proxy connections. Authenticated proxies are also supported. One can argue urllib2 already implements this. It does not do HTTPS tunneling through proxies, and this is intended to be lower-level than urllib2. ---------------------------------------------------------------------- >Comment By: Mihai Ibanescu (misa) Date: 2002-11-05 15:56 Message: Logged In: YES user_id=205865 I am having problems with proxying and keepalive connections. Setting to a lower priority until I figure out the documentation. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-10-07 17:20 Message: Logged In: YES user_id=205865 Boy, two months. Yes, I'll go back to working on the patch. Sorry for the delay. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-07 17:17 Message: Logged In: YES user_id=21627 misa, is a patch forthcoming? ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-07-15 17:37 Message: Logged In: YES user_id=205865 - I agree about the comments. I'll make them reasonable. - one underscore is fine - I intended to have a patch that works with python 1.5, but then again the module itself doesn't run with 1.5 anyway, so good point. - When you make a connection to a server through a proxy, you have to connect to the proxy, but everything else should be the same, i.e. the Host: field has to refer to the server and so on. I wanted to reuse the code from _set_hostport, which saves the host and port in self.host, self.port. Had to do it twice, once for the proxy hostname, once for the server's. _set_hostport takes care of the default port and of the "hostname:port" syntax, which is convenient. I'll put together a patched patch and upload it. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-07-15 17:21 Message: Logged In: YES user_id=31392 The proposed classes seem useful enough, but I would like to make several suggestions for the implementation. - There are too many comments. Comments should only be added when the intent of the code needs to be explained. We definitely don't need one comment for each line of code. The comment in the HTTPS proxy putrequest() is an example of a helpful comment. - Just use a single underscore for private variables. - Please use string methods instead of the string module. - I don't understand the logic of switching the host/port back and forth. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-07-15 16:52 Message: Logged In: YES user_id=31392 I'll take a look. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 11:46 Message: Logged In: YES user_id=6380 Assigning to Jeremy in the hope that he can provide a review. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-06-23 23:03 Message: Logged In: YES user_id=205865 The newer patch is generated against the latest CVS tree, and it provides additional documentation. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-06-11 14:47 Message: Logged In: YES user_id=205865 Sorry, been caught with a zillion of other things to do. I'll try to reorganize it somehow and ask for opinions. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-06-11 14:42 Message: Logged In: YES user_id=31392 misa-- any progress on this patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 18:12 Message: Logged In: YES user_id=6380 OK, thanks; I'll wait! ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-03-01 17:58 Message: Logged In: YES user_id=205865 I will add documentation and show the intended usage. urllib* doesn't deal with proxying over SSL (using CONNECT instead of GET/POST). urllib* also use the compatibility classes, HTTP/HTTPS, instead of HTTPConnection (this is not an argument by itself). Thanks for the suggestion. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:40 Message: Logged In: YES user_id=6380 This patch fails to seduce me. There's no explanation why this would be useful, or how it should be used, and no documentation, and a hint that urllib2 already does this. Maybe you can get someone who's known on python-dev to champion it, if you think it's useful? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 From noreply@sourceforge.net Tue Nov 5 21:28:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 13:28:36 -0800 Subject: [Patches] [ python-Patches-631678 ] New pdb command "pp" Message-ID: Patches item #631678, was opened at 2002-10-31 12:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631678&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Barry A. Warsaw (bwarsaw) >Assigned to: Guido van Rossum (gvanrossum) >Summary: New pdb command "pp" Initial Comment: I often find that I want to pretty print values in pdb. This patch adds a "pp" command, which is much like "p" except it pretty prints the value. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-05 16:28 Message: Logged In: YES user_id=6380 Cool. Can you please add documentation? "pp" currently shows up in pdb's "help" output as "undocumented". There's also the LaTeX docs for pdb. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631678&group_id=5470 From noreply@sourceforge.net Tue Nov 5 21:38:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 13:38:14 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Martin v. Löwis (loewis) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 5 22:19:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Nov 2002 14:19:29 -0800 Subject: [Patches] [ python-Patches-631678 ] New pdb command "pp" Message-ID: Patches item #631678, was opened at 2002-10-31 12:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631678&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Barry A. Warsaw (bwarsaw) Assigned to: Guido van Rossum (gvanrossum) >Summary: New pdb command "pp" Initial Comment: I often find that I want to pretty print values in pdb. This patch adds a "pp" command, which is much like "p" except it pretty prints the value. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-11-05 17:19 Message: Logged In: YES user_id=12800 Will do. I'm just going to check this stuff in. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-05 16:28 Message: Logged In: YES user_id=6380 Cool. Can you please add documentation? "pp" currently shows up in pdb's "help" output as "undocumented". There's also the LaTeX docs for pdb. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631678&group_id=5470 From noreply@sourceforge.net Wed Nov 6 14:17:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 06:17:43 -0800 Subject: [Patches] [ python-Patches-633635 ] Too much chtype in _cursesmodule.c Message-ID: Patches item #633635, was opened at 2002-11-05 00:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 Category: Modules Group: None >Status: Closed Resolution: Accepted Priority: 5 Submitted By: David M. Cooke (dmcooke) Assigned to: A.M. Kuchling (akuchling) Summary: Too much chtype in _cursesmodule.c Initial Comment: The C prototype for getch is 'int getch(void)', not 'chtype getch(void)', as assumed by _cursesmodule.c (the same for ungetch and keyname). [I've checked this under Linux, SunOS, and Tru64] keyname() seems to segfault if passed -1, so I've tested for that. In addition, according to the docs, the .getch() and .getkey() methods of a window object should throw an exception when there isn't any input in nodelay mode (the C functions return ERR (== -1) in this case). I've fixed getkey, but not getch, in this patch: since getch returns an int anyways, it seems better to return -1 on no input. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-06 09:17 Message: Logged In: YES user_id=11375 Checked in to CVS; thanks! ---------------------------------------------------------------------- Comment By: David M. Cooke (dmcooke) Date: 2002-11-05 10:40 Message: Logged In: YES user_id=65069 Sounds good. I've updated the patch. keyname(c) throws a ValueError on c < 0, and I've fixed the documentation to mention getch() returns -1 in nodelay mode when there is no input. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-11-05 08:25 Message: Logged In: YES user_id=11375 The patch looks OK. One thing: the patch makes keyname() return an empty string if the character is -1. My gut feeling is that an exception is better, because -1 really isn't a legal key code. What do you think? Once we've resolved that issue, I'll check in the patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633635&group_id=5470 From noreply@sourceforge.net Wed Nov 6 14:24:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 06:24:20 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 13:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: Modules Group: None >Status: Closed Resolution: Accepted Priority: 8 Submitted By: Greg Chapman (glchapman) Assigned to: Fredrik Lundh (effbot) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- >Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-06 14:24 Message: Logged In: YES user_id=7887 Applied as: Lib/test/re_tests.py:1.30->1.31 Lib/test/test_sre.py:1.37->1.38 Misc/NEWS:1.511->1.512 Modules/_sre.c:2.83->2.84 Thank you very much! ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-10-04 16:51 Message: Logged In: YES user_id=86307 Assuming this patch is acceptable (I see it has not yet been applied to _sre.c), I wonder if it would be a good candidate for a backport to 2.2.2? (Though it still lacks a fix for the lastindex problem.) ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-08-12 21:17 Message: Logged In: YES user_id=86307 I noticed recently that the lastindex attribute of match objects is now documented, so I believe that the lastindex problem I described in my March 8 posting needs to be fixed. Simply, lastindex may claim that a group matched when in fact it didn't (because lastindex does not get updated when lastmark is reset to a lower value): >>> m = sre.match('(\d)?\d\d', '12') >>> m.groups() (None,) >>> m.lastindex 1 ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-12 11:11 Message: Logged In: YES user_id=38376 (bumped priority as a reminder to self) /F ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-08 18:28 Message: Logged In: YES user_id=31435 Assigned to /F -- he's the expert here. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 15:23 Message: Logged In: YES user_id=86307 I'm pretty sure the memset is correct; state->lastmark is the index of last mark written to (not the index of the next potential write). Also, it occurred to me that there is another related error here: >>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123') >>> m.groups() (None, None) >>> m.lastindex 2 In other words, lastindex claims that group 2 was the last that matched, even though it didn't really match. Since lastindex is undocumented, this probably doesn't matter too much. Still, it probably should be reset if it is pointing to a group which gets "unmatched" when state->lastmark is reduced. Perhaps a function like the following should be added for use in the three places where state->lastmark is reset to a previous value: void lastmark_restore(SRE_STATE *state, int lastmark) { assert(lastmark >= 0); if (state->lastmark > lastmark) { int lastvalidindex = (lastmark == 0) ? -1 : (lastmark-1)/2+1; if (state->lastindex > lastvalidindex) state->lastindex = lastvalidindex; memset( state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*) ); } state->lastmark = lastmark; } ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-08 13:29 Message: Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 13:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Wed Nov 6 15:28:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 07:28:16 -0800 Subject: [Patches] [ python-Patches-462754 ] no '_d' ending for mingw32 Message-ID: Patches item #462754, was opened at 2001-09-18 23:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470 Category: Distutils and setup.py Group: None Status: Closed Resolution: Rejected Priority: 5 Submitted By: Gerhard Häring (ghaering) Assigned to: Nobody/Anonymous (nobody) Summary: no '_d' ending for mingw32 Initial Comment: This patch prevents distutils from naming the extension modules _d.pyd when compiled with mingw32 on Windows in debug mode. Instead, the extension modules will get the normal name .pyd. Technically, the patch doesn't prevent the behaviour for mingw32, but only adds the _d for MS Visual C++ and Borland compilers (though I don't know about the Borland case). The reason for this? Adding "_d" doesn't make any sense for GNU compilers. I think it's just a MS Visual C++ madness. If you want to debug an extension module that was compiled with gcc, you have to use gdb anyway, because the debugging symbols of MSVC++ and gcc are incompatible. So you normally use a release Python version (from the python.org binary download) and compile your extensions with mingw32. To put it shortly: The current state is that you do a "setup.py build --compiler=mingw32 --debug" and then rename the extension modules, removing the _d. Then fire up gdb to debug your module. With this patch, the renaming isn't necessary anymore. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-06 10:28 Message: Logged In: YES user_id=11375 Is there any way to get the setting of the Py_DEBUG flag from Python code? I can't see any; if there was a way to detect this setting, couldn't the patch be trivially modified to be correct? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 07:20 Message: Logged In: YES user_id=21627 This patch is wrong: Whether or not _d should be added to the module name depends on whether or not Py_DEBUG is defined; this is independent on whether --debug was given, atleast for Cygwin (for MSVC, --debug will define _DEBUG which will define Py_DEBUG). So the current distutils is wrong (since it always adds _d), but the patch doesn't make it better (since it never adds _d). Rejecting the patch. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-04-17 01:07 Message: Logged In: YES user_id=163326 If python.exe is compiled --with-pydebug, then this is true. But the point is that I want to compile debug versions of my extension modules and use them with the standard python.exe (*not* python_d.exe). So yes, the patch does work, at least it did when I submitted it . ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-03-09 06:44 Message: Logged In: YES user_id=21627 Does the patch actually work? It seems to me that, if compiled with-pydebug, import will automatically search for the _d version, and complain if it is not found. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-01-04 06:52 Message: Logged In: YES user_id=21627 The rationale for using the debugging version of MSVCRT are not the debugging information alone, but also the additional functionalities, like heap consistency checks and other assertions. So it is not obvious that you do not want to use the debugging version of this library in a debug build. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-01-03 21:50 Message: Logged In: YES user_id=163326 mingw links with msvcrt.dll. I've plans to add mingw32 support to the autoconf build process (hopefully soon enough for 2.3). The GNU and MS debugger symbols are incompatible, though, so I think that mingw32 shouldn't link to the debug version of msrcrt (gdb doesn't understand the Microsoft debugger symbols; and the Visual Studio debugger has no idea what the debugging symbols of gcc are all about; isn't cross-platform and cross-compiler programming fun?). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2001-12-30 08:13 Message: Logged In: YES user_id=21627 How does the mingw port interact with the debugging libraries? With MSVC, the debug build will link to the debug versions of the CRT. What C library will mingw link with (I hope it won't use crtdll.dll)? ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2001-09-28 17:28 Message: Logged In: YES user_id=163326 Yes. But mingw32 isn't emulating Unix under Windows (that would be Cygwin). It's just a version of gcc and friends that targets native win32. It links against msvcrt (not a Posix emulation library like Cygwin does). This is a bit hypothetical because I didn't yet hack the autoconf build process for native win32 with mingw32. Currently, you cannot build a complete Python with mingw32, but you *can* build extension modules against an existing Python (compiled with M$ VC++). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-09-28 16:43 Message: Logged In: YES user_id=31435 All else being equal, a system emulating Unix under Windows should strive to make life comfortable for Unix folks. The question is thus whether all else is in fact equal . ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2001-09-28 14:37 Message: Logged In: YES user_id=163326 Hmm. I don't like the _d endings at all. But if the policy on win32 is that debug executables and libraries get a "_d" ending, then I'm unsure wether this patch should be applied. I have plans to hack the autoconf madness to build a native win32 Python with mingw32. But that won't be ready by tomorror. And I don't think that I'll add "_d" endings there for debugging, because that would be inconsistent with the normal autoconf builds on Unix. I'm glad that *I* don't have to decide wether this patch is a Good Thing. Being consistent with Python win32 build or with GNU (gcc/autoconf). Take your pick :-) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-09-18 23:46 Message: Logged In: YES user_id=31435 FYI, MSVC never adds _d on its own -- Mark Hammond and/or Guido forced it to do that. I don't remember why, but one of them explained it to me long ago and it made good sense at the time . MSCV normally compiles debug and release builds into distinct subdirectories, and uses the same names in both. But *our* MSVC setup forces it to compile both flavors of build directly into the PCbuild directory, so has to give the resulting DLLs and executables different names (else the second build would overwrite the results of the first build). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470 From noreply@sourceforge.net Wed Nov 6 18:04:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 10:04:06 -0800 Subject: [Patches] [ python-Patches-634557 ] Better inspect.BlockFinder fixes bug Message-ID: Patches item #634557, was opened at 2002-11-06 12:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634557&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Patrick K. O'Brien (pobrien) Assigned to: Nobody/Anonymous (nobody) Summary: Better inspect.BlockFinder fixes bug Initial Comment: inspect.BlockFinder didn't do a good enough job finding the end of code blocks. This can be observed by running: >>> import inspect >>> import tokenize >>> print inspect.getsource(tokenize.TokenError) class TokenError(Exception): pass class StopTokenizing(Exception): pass def printtoken(type, token, (srow, scol), (erow, ecol), line): # for testing print "%d,%d-%d,%d:\t%s\t%s" % \ (srow, scol, erow, ecol, tok_name[type], repr(token)) >>> Notice how it picks up extra source code lines. The attached patch fixes this problem. There should probably be some additional unit tests for this, but I ran out of time and energy. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634557&group_id=5470 From noreply@sourceforge.net Wed Nov 6 19:04:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 11:04:51 -0800 Subject: [Patches] [ python-Patches-633547 ] Plural forms support for gettext Message-ID: Patches item #633547, was opened at 2002-11-05 00:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Juan David Ibáñez Palomar (jdavid) Assigned to: Nobody/Anonymous (nobody) Summary: Plural forms support for gettext Initial Comment: Adds support for plural forms to the gettext module. The test script has been rewritten to use unittest. ---------------------------------------------------------------------- >Comment By: Juan David Ibáñez Palomar (jdavid) Date: 2002-11-06 20:04 Message: Logged In: YES user_id=17532 I wasn't aware of the security implications, there will be a new version of the patch sometime between 18 and 30 this month. I used eval for simplicity and performance reasons, the lookup in the catalog must be as fast as posible, so the parsing must be when the MO file is loaded. I will keep the use of eval, but it will check that 'n' is the only identifier used and, by the way, I will clean this part of the patch. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=21627 The patch looks quite good, overall. However, I don't like the use of eval to generate the plural form function: it is, in general, a security issue to evaluate a string that you read from some file. I would prefer if it parses the string, or uses other mechanisms to establish "safety": for example, if the only identifier occurring in the string is 'n', then this would be a good test. You might want to use tokenize.generate_tokens for that. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 From noreply@sourceforge.net Wed Nov 6 22:16:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 14:16:37 -0800 Subject: [Patches] [ python-Patches-633547 ] Plural forms support for gettext Message-ID: Patches item #633547, was opened at 2002-11-05 00:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Juan David Ibáñez Palomar (jdavid) Assigned to: Nobody/Anonymous (nobody) Summary: Plural forms support for gettext Initial Comment: Adds support for plural forms to the gettext module. The test script has been rewritten to use unittest. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-06 23:16 Message: Logged In: YES user_id=21627 Just in case the security implications are not clear: Somebody might put os.chmod('/etc/passwd',0777) into a message catalog, and the superuser might run that script. ---------------------------------------------------------------------- Comment By: Juan David Ibáñez Palomar (jdavid) Date: 2002-11-06 20:04 Message: Logged In: YES user_id=17532 I wasn't aware of the security implications, there will be a new version of the patch sometime between 18 and 30 this month. I used eval for simplicity and performance reasons, the lookup in the catalog must be as fast as posible, so the parsing must be when the MO file is loaded. I will keep the use of eval, but it will check that 'n' is the only identifier used and, by the way, I will clean this part of the patch. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=21627 The patch looks quite good, overall. However, I don't like the use of eval to generate the plural form function: it is, in general, a security issue to evaluate a string that you read from some file. I would prefer if it parses the string, or uses other mechanisms to establish "safety": for example, if the only identifier occurring in the string is 'n', then this would be a good test. You might want to use tokenize.generate_tokens for that. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633547&group_id=5470 From noreply@sourceforge.net Wed Nov 6 22:41:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 14:41:46 -0800 Subject: [Patches] [ python-Patches-634557 ] Better inspect.BlockFinder fixes bug Message-ID: Patches item #634557, was opened at 2002-11-06 13:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634557&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Patrick K. O'Brien (pobrien) Assigned to: Nobody/Anonymous (nobody) Summary: Better inspect.BlockFinder fixes bug Initial Comment: inspect.BlockFinder didn't do a good enough job finding the end of code blocks. This can be observed by running: >>> import inspect >>> import tokenize >>> print inspect.getsource(tokenize.TokenError) class TokenError(Exception): pass class StopTokenizing(Exception): pass def printtoken(type, token, (srow, scol), (erow, ecol), line): # for testing print "%d,%d-%d,%d:\t%s\t%s" % \ (srow, scol, erow, ecol, tok_name[type], repr(token)) >>> Notice how it picks up extra source code lines. The attached patch fixes this problem. There should probably be some additional unit tests for this, but I ran out of time and energy. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-06 17:41 Message: Logged In: YES user_id=33168 Does this fix bug # 595018? There are a few other inspect module bugs I think. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634557&group_id=5470 From noreply@sourceforge.net Wed Nov 6 22:55:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Nov 2002 14:55:11 -0800 Subject: [Patches] [ python-Patches-634557 ] Better inspect.BlockFinder fixes bug Message-ID: Patches item #634557, was opened at 2002-11-06 12:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634557&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Patrick K. O'Brien (pobrien) Assigned to: Nobody/Anonymous (nobody) Summary: Better inspect.BlockFinder fixes bug Initial Comment: inspect.BlockFinder didn't do a good enough job finding the end of code blocks. This can be observed by running: >>> import inspect >>> import tokenize >>> print inspect.getsource(tokenize.TokenError) class TokenError(Exception): pass class StopTokenizing(Exception): pass def printtoken(type, token, (srow, scol), (erow, ecol), line): # for testing print "%d,%d-%d,%d:\t%s\t%s" % \ (srow, scol, erow, ecol, tok_name[type], repr(token)) >>> Notice how it picks up extra source code lines. The attached patch fixes this problem. There should probably be some additional unit tests for this, but I ran out of time and energy. ---------------------------------------------------------------------- >Comment By: Patrick K. O'Brien (pobrien) Date: 2002-11-06 16:55 Message: Logged In: YES user_id=179604 Yes, this does fix bug # 595018. I'll look at the other inspect module bugs when I've got some spare time. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-06 16:41 Message: Logged In: YES user_id=33168 Does this fix bug # 595018? There are a few other inspect module bugs I think. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634557&group_id=5470 From noreply@sourceforge.net Thu Nov 7 08:39:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 00:39:41 -0800 Subject: [Patches] [ python-Patches-634866 ] general corrections to 2.2.2 refman, p.1 Message-ID: Patches item #634866, was opened at 2002-11-07 09:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634866&group_id=5470 Category: Documentation Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: general corrections to 2.2.2 refman, p.1 Initial Comment: as per email exchanges with F. Drake, here's a first part of suggested corrections to the 2.2.2 reference manual, mostly to make it reflect a bit better the way Python currently works. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634866&group_id=5470 From noreply@sourceforge.net Thu Nov 7 20:42:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 12:42:23 -0800 Subject: [Patches] [ python-Patches-633374 ] nondestructive dict.popitem and Set.pop Message-ID: Patches item #633374, was opened at 2002-11-04 12:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Williams (johnw42) >Assigned to: Guido van Rossum (gvanrossum) Summary: nondestructive dict.popitem and Set.pop Initial Comment: This patch (relative to the latest Python CVS tree) adds a "pickitem" method to the builtin dict class and a "pick" method to the BaseSet class. These methods are analogs of "dict.popitem" and "Set.pop", but they don't remove the item they return from the dict/set. This patch *does not* update the documentation. This is my system: Linux 2.4.2-2 #1 i686 unknown ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-07 15:42 Message: Logged In: YES user_id=31435 Assigned to Guido for Pronouncement. I've rarely had a need to select "an arbitrary" dict entry *without* also removing it from a dict. In the few cases that's come up, I've done k, v = dict.popitem() dict[k] = v happily. We seem to be missing a compelling use case here. ---------------------------------------------------------------------- Comment By: John Williams (johnw42) Date: 2002-11-05 10:20 Message: Logged In: YES user_id=44174 There's no technical reason why this patch is necessary, but I think having it would make it easier to write clean and readable code. It seems very unintuitive to me that the nondescructive analog of "popitem()" would be "iteritems().next()". Even though I'm pretty familiar with iterators, this solution did not occur to me immediately, and I suspect a large portion of Python's users don't even know about using iterators this way. The "items()[0]" solution is even worse, IMHO, since it involves generating a whole list just to get a single item. I was also trying to preserve the similarity between dicts and sets, and both the list solution and the iterator solution look pretty different when used on sets. Also, using an iterator fails with a StopIteration exception when the dict/set is empty, but the methods in the patch raise KeyError with a helpful error string explaining the problem, just like pop and popitem. I wouldn't venture to guess how often others would use these methods; I just know I would have found them helpful recently. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:31 Message: Logged In: YES user_id=80475 dict.popitem() was added because the it could retrieve and delete a key/value pair without hashing -- there were no existing methods which could achieve the same result. In contract, the dict.pickitem() patch doesn't appear to offer a differential advantage over dict.iteritems().next() for retrieving an arbitrary (hash order) key/value pair. Also, since successive calls to pickitem() retrieve the same pair, it doesn't appear to be useful in a loop or warrant a C speed optimization ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:54 Message: Logged In: YES user_id=33168 Fixed the Summary, lest someone think you were making a personal problem public. :-) Seriously though...I haven't looked at the patch, but could you explain the rationale/benefit? Is this likely to be useful to many people or is it fairly limited? Couldn't you do dict.items()[0] if you wanted a random value? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 From noreply@sourceforge.net Thu Nov 7 20:46:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 12:46:10 -0800 Subject: [Patches] [ python-Patches-633374 ] nondestructive dict.popitem and Set.pop Message-ID: Patches item #633374, was opened at 2002-11-04 12:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: None Priority: 5 Submitted By: John Williams (johnw42) Assigned to: Guido van Rossum (gvanrossum) Summary: nondestructive dict.popitem and Set.pop Initial Comment: This patch (relative to the latest Python CVS tree) adds a "pickitem" method to the builtin dict class and a "pick" method to the BaseSet class. These methods are analogs of "dict.popitem" and "Set.pop", but they don't remove the item they return from the dict/set. This patch *does not* update the documentation. This is my system: Linux 2.4.2-2 #1 i686 unknown ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 15:46 Message: Logged In: YES user_id=6380 -1. John hasn't explained in what kind of algorithm he needs to pick a dict item. I find this exceedingly rare myself. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-07 15:42 Message: Logged In: YES user_id=31435 Assigned to Guido for Pronouncement. I've rarely had a need to select "an arbitrary" dict entry *without* also removing it from a dict. In the few cases that's come up, I've done k, v = dict.popitem() dict[k] = v happily. We seem to be missing a compelling use case here. ---------------------------------------------------------------------- Comment By: John Williams (johnw42) Date: 2002-11-05 10:20 Message: Logged In: YES user_id=44174 There's no technical reason why this patch is necessary, but I think having it would make it easier to write clean and readable code. It seems very unintuitive to me that the nondescructive analog of "popitem()" would be "iteritems().next()". Even though I'm pretty familiar with iterators, this solution did not occur to me immediately, and I suspect a large portion of Python's users don't even know about using iterators this way. The "items()[0]" solution is even worse, IMHO, since it involves generating a whole list just to get a single item. I was also trying to preserve the similarity between dicts and sets, and both the list solution and the iterator solution look pretty different when used on sets. Also, using an iterator fails with a StopIteration exception when the dict/set is empty, but the methods in the patch raise KeyError with a helpful error string explaining the problem, just like pop and popitem. I wouldn't venture to guess how often others would use these methods; I just know I would have found them helpful recently. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:31 Message: Logged In: YES user_id=80475 dict.popitem() was added because the it could retrieve and delete a key/value pair without hashing -- there were no existing methods which could achieve the same result. In contract, the dict.pickitem() patch doesn't appear to offer a differential advantage over dict.iteritems().next() for retrieving an arbitrary (hash order) key/value pair. Also, since successive calls to pickitem() retrieve the same pair, it doesn't appear to be useful in a loop or warrant a C speed optimization ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-04 18:54 Message: Logged In: YES user_id=33168 Fixed the Summary, lest someone think you were making a personal problem public. :-) Seriously though...I haven't looked at the patch, but could you explain the rationale/benefit? Is this likely to be useful to many people or is it fairly limited? Couldn't you do dict.items()[0] if you wanted a random value? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633374&group_id=5470 From noreply@sourceforge.net Thu Nov 7 21:15:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 13:15:52 -0800 Subject: [Patches] [ python-Patches-549037 ] ConfigParser: optional section header Message-ID: Patches item #549037, was opened at 2002-04-26 12:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=549037&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Detlef Lannert (lannert) Assigned to: Nobody/Anonymous (nobody) Summary: ConfigParser: optional section header Initial Comment: Each configuration file parsed by ConfigParser.py must start with a section header line (a section name enclosed in [...]). In many cases where I just want to parse a file with some variable settings for a program this is IMO a nuisance: The user must know the expected section name and insert a redundant header line, even if there are no other sections possible. The "surprise factor" is even higher when RFC[2]822 syntax is used; the config file then looks more like a standard mail header which wouldn't start with a section title anyway. Since the config file is read and the case of missing section titles handled in the __read() method of the ConfigParser, there is no easy way to modify the parser's behaviour just by subclassing and overwriting a method. The patch lets the caller specify a default section name; if the config file doesn't start with a [section] line but with option lines, a suitable section is automatically created and holds the option entries. In any other case, i.e., when the header line is present in the config file and/or when no default startsection name is specified, the parser's behaviour is unchanged. Thus there shouldn't be a compatibility issue. In case the upload of the patch file fails (which has happened to me and my browser before), please have a look at ; the diffs also add some lines to test_cfgparser.py. ---------------------------------------------------------------------- >Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-07 21:15 Message: Logged In: YES user_id=7887 Thanks for purposing that Detlef. Having a configuration file without headers could indeed be interesting in some situations. I have a few comments about the implementation: The patch includes a new parameter in read functions, stating what's the first section name. It means that we could have other sections after the first unheaded section. IMO, that situation should still be considered an error. One possible way to implement it is to include a "noheaders" boolean parameter for the constructor. Then, the user would have to know what's the standard single section name, to pass it to functions like get(). Another way would be to include something like a "singlesection" parameter in the constructor. This parameter would accept a string option, which would name the single section. As an argument against the whole issue, I'm not sure how unconfortable it is to simply include a header in the file to satisfy the parser. As an argument favorable, this could allow ConfigParser to parse simple (no escapes or variables) shell configuration files and other simple configurations using NAME=VALUE style. I'm attaching an alternative implementation of the singlesection algorithm, described above. Would it be enough for your needs? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=549037&group_id=5470 From noreply@sourceforge.net Thu Nov 7 21:16:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 13:16:03 -0800 Subject: [Patches] [ python-Patches-549037 ] ConfigParser: optional section header Message-ID: Patches item #549037, was opened at 2002-04-26 12:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=549037&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Detlef Lannert (lannert) Assigned to: Nobody/Anonymous (nobody) Summary: ConfigParser: optional section header Initial Comment: Each configuration file parsed by ConfigParser.py must start with a section header line (a section name enclosed in [...]). In many cases where I just want to parse a file with some variable settings for a program this is IMO a nuisance: The user must know the expected section name and insert a redundant header line, even if there are no other sections possible. The "surprise factor" is even higher when RFC[2]822 syntax is used; the config file then looks more like a standard mail header which wouldn't start with a section title anyway. Since the config file is read and the case of missing section titles handled in the __read() method of the ConfigParser, there is no easy way to modify the parser's behaviour just by subclassing and overwriting a method. The patch lets the caller specify a default section name; if the config file doesn't start with a [section] line but with option lines, a suitable section is automatically created and holds the option entries. In any other case, i.e., when the header line is present in the config file and/or when no default startsection name is specified, the parser's behaviour is unchanged. Thus there shouldn't be a compatibility issue. In case the upload of the patch file fails (which has happened to me and my browser before), please have a look at ; the diffs also add some lines to test_cfgparser.py. ---------------------------------------------------------------------- >Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-07 21:16 Message: Logged In: YES user_id=7887 Thanks for purposing that Detlef. Having a configuration file without headers could indeed be interesting in some situations. I have a few comments about the implementation: The patch includes a new parameter in read functions, stating what's the first section name. It means that we could have other sections after the first unheaded section. IMO, that situation should still be considered an error. One possible way to implement it is to include a "noheaders" boolean parameter for the constructor. Then, the user would have to know what's the standard single section name, to pass it to functions like get(). Another way would be to include something like a "singlesection" parameter in the constructor. This parameter would accept a string option, which would name the single section. As an argument against the whole issue, I'm not sure how unconfortable it is to simply include a header in the file to satisfy the parser. As an argument favorable, this could allow ConfigParser to parse simple (no escapes or variables) shell configuration files and other simple configurations using NAME=VALUE style. I'm attaching an alternative implementation of the singlesection algorithm, described above. Would it be enough for your needs? ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-07 21:15 Message: Logged In: YES user_id=7887 Thanks for purposing that Detlef. Having a configuration file without headers could indeed be interesting in some situations. I have a few comments about the implementation: The patch includes a new parameter in read functions, stating what's the first section name. It means that we could have other sections after the first unheaded section. IMO, that situation should still be considered an error. One possible way to implement it is to include a "noheaders" boolean parameter for the constructor. Then, the user would have to know what's the standard single section name, to pass it to functions like get(). Another way would be to include something like a "singlesection" parameter in the constructor. This parameter would accept a string option, which would name the single section. As an argument against the whole issue, I'm not sure how unconfortable it is to simply include a header in the file to satisfy the parser. As an argument favorable, this could allow ConfigParser to parse simple (no escapes or variables) shell configuration files and other simple configurations using NAME=VALUE style. I'm attaching an alternative implementation of the singlesection algorithm, described above. Would it be enough for your needs? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=549037&group_id=5470 From noreply@sourceforge.net Thu Nov 7 23:30:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 15:30:26 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Guido van Rossum (gvanrossum) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 00:55:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 16:55:27 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 06:36:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 22:36:03 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 06:52:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Nov 2002 22:52:24 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-28 03:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 07:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 01:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 00:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 22:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 19:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 18:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 09:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 07:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 08:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-29 05:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-29 05:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 13:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 08:59:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 00:59:04 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 12:37:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 04:37:46 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 12:56:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 04:56:40 -0800 Subject: [Patches] [ python-Patches-617312 ] debugger-controlled jumps (Psyco #3) Message-ID: Patches item #617312, was opened at 2002-10-02 00:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=617312&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Armin Rigo (arigo) Assigned to: Michael Hudson (mwh) Summary: debugger-controlled jumps (Psyco #3) Initial Comment: Psyco-friendly patch #3. Allows the C profile and trace functions (called upon function entry, line tracing, function exit, and exception) to alter some fields of the current PyFrameObject to influence the execution of the main loop. This is currently impossible because all the relevant fields are copied into local variables of eval_frame(), shadowing any subsequence change in the frame object. This is designed for what I plan to do with Psyco, but could also be used by advanced debuggers to allow the execution point to be modified by the user. Besides, the patch is just a matter of swapping a few lines in eval_frame(), introducing almost no extra complexity. In ceval.c:eval_frame(): The calls to call_trace(...PyTrace_CALL...) at the beginning of the code have been moved above the initialization of the local variables, allowing the trace functions to fiddle with the frame object before the main loop sees it. There is also a bug fix here: if the profile or trace functions raise an error at this point, eval_frame() used to quit without restoring tstate->frame and tstate->recursion_depth. [XXX should have been a bug report on its own] Finally, the call_trace() on SET_LINENO has been slightly modified to allow the trace function to move the execution point elsewhere. This is done by saving a few local variables are saved to and restored from the frame object around the call_trace(). The variables are the current instruction pointer (which was already saved in f->f_lasti, but is now restored from f->f_lasti too), and the current stack_pointer. Compatibility: f->f_stacktop normally remains NULL for the whole execution of the frame. This patch sets it to a non-NULL value for the duration of the call to the line-by-line trace function. I expect this not to cause any incompatibility because f->f_stacktop is not visible from Python code. I do not expect extension modules to rely on this detail. Note however that this has an influence on the GC, which only visits the stack if f->f_stacktop is set. Again, I assume this is not an issue -- it cannot even cause dead cycles to be detected earlier by the GC because as long as we are in the call_trace() call, we have a live reference to f. Performance: when tracing is on, SET_LINENO is now marginally slower. I guess this is not considered as an issue given the debugging nature of tracing. In Python 2.3, line tracing is currently *much* heavier and nobody seems to complain ;-) ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-08 12:56 Message: Logged In: YES user_id=6656 I got lazy and checked in all the psyco patches at once: Include/pystate.h revision 2.21 Modules/pyexpat.c revision 2.76 Python/ceval.c revision 2.340 Python/pystate.c revision 2.22 ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-07 16:35 Message: Logged In: YES user_id=4771 Uploaded patch for 2.3. I had to change the order of some things in the main loop -- namely, call maybe_call_trace_line () before the next opcode/oparg is loaded. To do so I had to simplify the signature of maybe_call_trace_line(): I removed its first argument, 'opcode', which wasn't used any more anyway. I carefully checked that I didn't broke anything, and 'test_trace' says so, but you may as well double-check the patch. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-07 10:38 Message: Logged In: YES user_id=6656 OK done, in ceval.c revision 2.301.4.6. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-07 01:28 Message: Logged In: YES user_id=4771 Here you are! ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-10-06 21:44 Message: Logged In: YES user_id=6380 I'd like to get this into 2.2.2, but the patch doesn't apply cleanly (courtesy of MWH fixing the beg reported herein :-). Armin, can you upload a fixed patch? Or if MWH reads this, can you check this and the other two in? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-03 09:50 Message: Logged In: YES user_id=6656 Fair enough. Let's see what Guido thinks. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-02 21:33 Message: Logged In: YES user_id=4771 The trace function can change f_lasti if it knows what it does and if it accordingly changes all other fields too, like f_stacktop, f_blockstack and f_iblock. In fact, f_lasti and f_stacktop are the only fields of the frame objects that are currently cached in local variables (with the exception of what concerns the f_code object itself) so that this patch is enough to let the trace function actually move the execution point elsewhere. This would be quite useful in Psyco, if the rest of the function has been emulated and we then want the main loop to exit with the proper value: just move f_lasti just before the RETURN_VALUE opcode, clean up the stack and block stack, and push the already-computed result value where RETURN_VALUE will find it. Psyco could also execute just a part of the function and give it back to Python at some later position. A "clean hack". ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-02 14:14 Message: Logged In: YES user_id=6656 Never mind, the bug offended me so much that I fixed it both on the trunk and the 22-maint branch. Python/ceval.c revisions 2.337 and 2.301.4.5 Lib/test/test_trace.py revisions 1.4 and 1.4.2.1 (probably a 2.1 bugix candidate, if any cares...) ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-02 13:23 Message: Logged In: YES user_id=6656 What about the blockstack? Are you just relying on trace functions not moving f_lasti unwisely? Agree about the bug. Do you have a test case for it? Otherwise I'll cook one up. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=617312&group_id=5470 From noreply@sourceforge.net Fri Nov 8 12:56:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 04:56:55 -0800 Subject: [Patches] [ python-Patches-617311 ] Tiny profiling info (Psyco #2) Message-ID: Patches item #617311, was opened at 2002-10-02 00:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=617311&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Armin Rigo (arigo) Assigned to: Michael Hudson (mwh) Summary: Tiny profiling info (Psyco #2) Initial Comment: Psyco-friendly patch #2. A very very small statistic-collecting patch. pystate.h: added a field at the end of the PyThreadStruct: int tick_counter; ceval.c: eval_frame(): tstate->tick_counter is incremented whenever the check_interval ticker reaches zero. The purpose is to give a useful measure of the number of interpreted bytecode instructions in a given thread. This extremely lightweight statistic collector can be of interest to profilers (like psyco.jit()). We can safely guess that a single integer increment every 100 interpreted bytecode instructions will go entierely unnoticed in any performance measure. [This is true for pystone.py.] ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-08 12:56 Message: Logged In: YES user_id=6656 I got lazy and checked in all the psyco patches at once: Include/pystate.h revision 2.21 Modules/pyexpat.c revision 2.76 Python/ceval.c revision 2.340 Python/pystate.c revision 2.22 ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-09 14:20 Message: Logged In: YES user_id=4771 Attached an updated diff for 2.3. This one doesn't have Windows line endings and includes the initialization of tick_counter to 0 that was added by Guido in the latest 2.2.2. (I thought it was unnecessary to initialize it to anything because profilers would only be interested in differences.) ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-07 15:17 Message: Logged In: YES user_id=4771 Uploaded the 2.3 patch (this one cleanly generated -- for the other one I just cat'ed two patches in one). ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-07 10:42 Message: Logged In: YES user_id=6656 Done in Include/pystate.h revision 2.18.16.1 Python/ceval.c revision 2.301.4.7 Armin, I don't know how you generated this patch, but it would have been easier to apply if it had been rooted in the "src" directory, like e.g.: $ cvs diff Include/pystate.h Python/ceval.c > ~/diff ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-10-06 21:45 Message: Logged In: YES user_id=6380 I'd like to get this into 2.2.2. MWH, can you check it in? ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-04 15:36 Message: Logged In: YES user_id=4771 It is the only way I could work out so far that can "predict" how much a function will be accelerated when run under Psyco. This is a very precious indication for an automatic Psyco-binder. The following table shows the results with the various test functions of the distribution's "test.py" file: (fn name) (speed-up) (bytecode insns per second) f1 106.00 2310545 f4 11.33 2819100 f5 12.08 2992445 f6 1.35 412022 f7 2.24 1331353 f7bis 10.29 1632296 The third column is '(tick_counter * check_interval) / execution_time'. The correlation between the two columns is admittedly not perfect, but still we can see that it was not worthy to try and accelerate f6 because it didn't spend a lot of time actually interpreting bytecodes. Note that similar information could be obtained by setting a line-tracing hook, counting not instructions but lines (which is less precise but still a good approximation). However, line tracing is *much* too slow for anything but debugging usage. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-04 11:42 Message: Logged In: YES user_id=6656 I see no harm in this. Are you sure it's actually going to be useful, though? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=617311&group_id=5470 From noreply@sourceforge.net Fri Nov 8 12:59:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 04:59:59 -0800 Subject: [Patches] [ python-Patches-617309 ] getframe hook (Psyco #1) Message-ID: Patches item #617309, was opened at 2002-10-02 00:17 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=617309&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Armin Rigo (arigo) Assigned to: Michael Hudson (mwh) Summary: getframe hook (Psyco #1) Initial Comment: Psyco-friendly patch #1 Allow Psyco-like extension modules to quickly plug their own notion of frame call stack into Python, for the use of the interpreter's parts that rely on the call stack for context information. For example, the code that creates a new class implicitely reads the top frame's globals if the class does not explicitely defines a __module__ attribute. With this new hook, Psyco can provide such code with the expected frame object. pystate.h: The PyThreadState structure has a new field added at its end: Py_getframehook getframe; where typedef PyFrameObject *(*Py_getframehook)(PyThreadState *, int); This field points to a function that returns the nth frame object in the call stack. By default, it points to a standard function that starts with tstate->frame and walks their f_back fields, just like the implementation of sys._getframe(). The purpose of this is to allow Psyco to hook another function at this point, in order to lazily emulate the frame objects that correspond to frames executed by Psyco. sysmodule.c: sys_getframe() calls the hook. ceval.c: PyEval_GetFrame() calls the hook. various other places in ceval.c: replaced PyThreadState_Get()->frame with PyEval_GetFrame() so that the hook will be called. pyexpat.c: replaced a PyThreadState_Get()->frame->f_globals with PyEval_GetGlobals(). Note that there are other places using 'frame' and the 'f_back' pointers which have not been changed because they are concerned with actual (classical) bytecode interpretation. The hook is only used in places that are interested in obtaining contextual information (like what the previous frame's globals are), not in places that actually builds frames in which bytecode will be interpreted. Compatibility: third-party extension modules directly reading frame, like Expat before this patch, will exhibit a marginally wrong behavior with Psyco until they are modified to call the hook (or better the "official" interpreter routines that are modified to so do). It does not break anything at all as long as we are not using Psyco. Performance overhead: one more indirect call isn't heavy. More importantly, I don't expect the concerned functions to be used more than occasionally in any code. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-08 12:59 Message: Logged In: YES user_id=6656 I got lazy and checked in all the psyco patches at once: Include/pystate.h revision 2.21 Modules/pyexpat.c revision 2.76 Python/ceval.c revision 2.340 Python/pystate.c revision 2.22 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-10-07 13:34 Message: Logged In: YES user_id=6380 Thanks much Michael for the three sets of Psyco checkins in 2.2.2! Armin, I think for Python 2.3 some patches must be different because there's no SET_LINENO opcode. Can you provide updated versions for those? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-07 10:48 Message: Logged In: YES user_id=6656 OK, checked in as: Include/pystate.h revision 2.18.16.2 Modules/pyexpat.c revision 2.57.6.4 Python/ceval.c revision 2.301.4.8 Python/pystate.c revision 2.20.16.1 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-10-06 21:46 Message: Logged In: YES user_id=6380 I'd like to get this int. 2.2.2. MWH, can you check it in? ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-10-04 16:46 Message: Logged In: YES user_id=4771 Here is a simpler patch doing only the one thing that I really cannot work around in Psyco (and I've tried, believe me!) : a way to hook my own replacement function for PyEval_GetFrame(). No more sysmodule change. Just a few places here and there with 'PyThreadState_Get()->frame' replaced with 'PyEval_GetFrame()' so that my hook will trigger. Is the new patch clean enough ? If so I'll assign it to Guido for review. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-04 11:43 Message: Logged In: YES user_id=6656 I'd be uneasy about a change of this subtlety going into the 2.2 branch. Aren't there other ways you can do this? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=617309&group_id=5470 From noreply@sourceforge.net Fri Nov 8 13:11:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 05:11:45 -0800 Subject: [Patches] [ python-Patches-631276 ] Exceptions raised by line trace function Message-ID: Patches item #631276, was opened at 2002-10-30 22:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631276&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Richie Hindle (richiehindle) Assigned to: Michael Hudson (mwh) Summary: Exceptions raised by line trace function Initial Comment: Exceptions raised by line trace functions are not handled. I'm running the latest 2.3a0 (as of 30th October 2002). When a trace function called with an event of 'line' raises an exception, that exception is ignored by maybe_call_line_trace. This means that program never sees the exception, and that the next genuine exception to be raised gets muddled up with the one raised by the trace function. See the attached script for a demo. The bug (as far as I can tell) is that maybe_call_line_trace is ignoring the return value of call_trace. This patch makes maybe_call_line_trace pass that return value back to eval_frame, which then sets why to WHY_EXCEPTION and jumps to on_error. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-08 13:11 Message: Logged In: YES user_id=6656 Checked in as: Lib/test/test_trace.py revision 1.5 Python/ceval.c revision 2.341 after a little light massaging (I just checked in a conflicting patch). Cheers! ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-31 21:39 Message: Logged In: YES user_id=85414 Here's the patch to test_trace.py. RaisingTraceFuncTestCase now tests each of the four trace events independently. Note that this doesn't show up the problem of the lost exception cropping up when another exception is raised - use the trace_exception_bug.py script if you want to see that. ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-31 14:38 Message: Logged In: YES user_id=85414 Will do, later today. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-31 10:38 Message: Logged In: YES user_id=6656 Blame for this one is easy to find... Patch looks OK. Could I ask you to munge the test into a patch to test_trace? Otherwise I'll do it, but maybe not today. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631276&group_id=5470 From noreply@sourceforge.net Fri Nov 8 13:13:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 05:13:02 -0800 Subject: [Patches] [ python-Patches-631276 ] Exceptions raised by line trace function Message-ID: Patches item #631276, was opened at 2002-10-30 22:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631276&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Richie Hindle (richiehindle) Assigned to: Michael Hudson (mwh) Summary: Exceptions raised by line trace function Initial Comment: Exceptions raised by line trace functions are not handled. I'm running the latest 2.3a0 (as of 30th October 2002). When a trace function called with an event of 'line' raises an exception, that exception is ignored by maybe_call_line_trace. This means that program never sees the exception, and that the next genuine exception to be raised gets muddled up with the one raised by the trace function. See the attached script for a demo. The bug (as far as I can tell) is that maybe_call_line_trace is ignoring the return value of call_trace. This patch makes maybe_call_line_trace pass that return value back to eval_frame, which then sets why to WHY_EXCEPTION and jumps to on_error. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-08 13:12 Message: Logged In: YES user_id=6656 Is this Accepted or Fixed? Does it matter? Have I had enough of sf's tracker yet...? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-08 13:11 Message: Logged In: YES user_id=6656 Checked in as: Lib/test/test_trace.py revision 1.5 Python/ceval.c revision 2.341 after a little light massaging (I just checked in a conflicting patch). Cheers! ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-31 21:39 Message: Logged In: YES user_id=85414 Here's the patch to test_trace.py. RaisingTraceFuncTestCase now tests each of the four trace events independently. Note that this doesn't show up the problem of the lost exception cropping up when another exception is raised - use the trace_exception_bug.py script if you want to see that. ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-31 14:38 Message: Logged In: YES user_id=85414 Will do, later today. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-10-31 10:38 Message: Logged In: YES user_id=6656 Blame for this one is easy to find... Patch looks OK. Could I ask you to munge the test into a patch to test_trace? Otherwise I'll do it, but maybe not today. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=631276&group_id=5470 From noreply@sourceforge.net Fri Nov 8 13:19:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 05:19:50 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 13:33:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 05:33:13 -0800 Subject: [Patches] [ python-Patches-549037 ] ConfigParser: optional section header Message-ID: Patches item #549037, was opened at 2002-04-26 12:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=549037&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Wont Fix Priority: 5 Submitted By: Detlef Lannert (lannert) Assigned to: Nobody/Anonymous (nobody) Summary: ConfigParser: optional section header Initial Comment: Each configuration file parsed by ConfigParser.py must start with a section header line (a section name enclosed in [...]). In many cases where I just want to parse a file with some variable settings for a program this is IMO a nuisance: The user must know the expected section name and insert a redundant header line, even if there are no other sections possible. The "surprise factor" is even higher when RFC[2]822 syntax is used; the config file then looks more like a standard mail header which wouldn't start with a section title anyway. Since the config file is read and the case of missing section titles handled in the __read() method of the ConfigParser, there is no easy way to modify the parser's behaviour just by subclassing and overwriting a method. The patch lets the caller specify a default section name; if the config file doesn't start with a [section] line but with option lines, a suitable section is automatically created and holds the option entries. In any other case, i.e., when the header line is present in the config file and/or when no default startsection name is specified, the parser's behaviour is unchanged. Thus there shouldn't be a compatibility issue. In case the upload of the patch file fails (which has happened to me and my browser before), please have a look at ; the diffs also add some lines to test_cfgparser.py. ---------------------------------------------------------------------- >Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-08 13:33 Message: Logged In: YES user_id=7887 After a discussion in python-dev, it was decided that this function is not worth having in the ConfigParser standard class, since it's easy to include a dummy header in the file, and ConfigParser can also be subclassed. The complete discussion can be checked at: http://mail.python.org/pipermail/python-dev/2002-November/029968.html Thank you! ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-07 21:16 Message: Logged In: YES user_id=7887 Thanks for purposing that Detlef. Having a configuration file without headers could indeed be interesting in some situations. I have a few comments about the implementation: The patch includes a new parameter in read functions, stating what's the first section name. It means that we could have other sections after the first unheaded section. IMO, that situation should still be considered an error. One possible way to implement it is to include a "noheaders" boolean parameter for the constructor. Then, the user would have to know what's the standard single section name, to pass it to functions like get(). Another way would be to include something like a "singlesection" parameter in the constructor. This parameter would accept a string option, which would name the single section. As an argument against the whole issue, I'm not sure how unconfortable it is to simply include a header in the file to satisfy the parser. As an argument favorable, this could allow ConfigParser to parse simple (no escapes or variables) shell configuration files and other simple configurations using NAME=VALUE style. I'm attaching an alternative implementation of the singlesection algorithm, described above. Would it be enough for your needs? ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-11-07 21:15 Message: Logged In: YES user_id=7887 Thanks for purposing that Detlef. Having a configuration file without headers could indeed be interesting in some situations. I have a few comments about the implementation: The patch includes a new parameter in read functions, stating what's the first section name. It means that we could have other sections after the first unheaded section. IMO, that situation should still be considered an error. One possible way to implement it is to include a "noheaders" boolean parameter for the constructor. Then, the user would have to know what's the standard single section name, to pass it to functions like get(). Another way would be to include something like a "singlesection" parameter in the constructor. This parameter would accept a string option, which would name the single section. As an argument against the whole issue, I'm not sure how unconfortable it is to simply include a header in the file to satisfy the parser. As an argument favorable, this could allow ConfigParser to parse simple (no escapes or variables) shell configuration files and other simple configurations using NAME=VALUE style. I'm attaching an alternative implementation of the singlesection algorithm, described above. Would it be enough for your needs? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=549037&group_id=5470 From noreply@sourceforge.net Fri Nov 8 14:01:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 06:01:08 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 14:18:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 06:18:55 -0800 Subject: [Patches] [ python-Patches-629278 ] install lib-dynload .so files mode 555 Message-ID: Patches item #629278, was opened at 2002-10-27 01:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: J. Lewis Muir (jlmuir) >Assigned to: Michael Hudson (mwh) Summary: install lib-dynload .so files mode 555 Initial Comment: This is a patch to the setup.py file to set the permissions of the installed shared libraries to have mode 555 (r xr xr x). This fixes bug #549338 "lib-dynload/*.so permissions wrong" and a duplicate bug #583206 "lib-dynload/*.so wrong permissions". The problem was that the shared libraries are installed by simply copying the tree of built shared libraries from the build directory to the installation location. This means that the permissions of the installed shared library files will be whatever the permissions were on these files in the build directory. The permissions are never set. If the shared libraries do not have the execute bit set, then on some platforms (Linux, in my case), python will be broken. For example, if one tries to import the time module, python will raise an ImportError saying "No module named time". To fix this, I've added a class PyBuildInstallLib(install_lib) which does exactly what install_lib does by invoking the super implementation of the install method, but then sets the permissions correctly for the installed shared library files. In the setup call in the main function, I pass this PyBuildInstallLib class in the cmdclass dictionary as the class that should be used for the 'install_lib' command. Another approach would be to instead modify the Makefile to set the correct file modes of the installed shared library files in the 'sharedinstall' target right after running '... setup.py install ...'. I didn't do this because it seemed other file modes were being set by other commands in distutils so it seemed appropriate to do the same. Attached is a patch against the 2.2.2 release. This I have tested on my machine (x86, Mandrake 8.0 + updates, Linux 2.4.18). I've also looked at what's in CVS and my changes can be trivially made to the setup.py that's in CVS as of Sat 2002-10-26 5pm CDT. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-08 14:18 Message: Logged In: YES user_id=6656 Thanks for looking at this! This is a bit of a hack, but ne'er mind; I've been trying to think of a clean way of doing this for a while. I'd prefer to use sysconfig.get_config_vars("SO") than your hardcoded list of possible DSO extensions. Can you try the attached? (err, it's against HEAD but should apply to 2.2.2 with little difficulty). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 From noreply@sourceforge.net Fri Nov 8 17:06:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 09:06:01 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 18:20:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 10:20:19 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 19:23:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 11:23:24 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 19:34:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 11:34:52 -0800 Subject: [Patches] [ python-Patches-635656 ] os.tempnam behavior in Windows Message-ID: Patches item #635656, was opened at 2002-11-08 16:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Roberto Lublinerman (rluble) Assigned to: Nobody/Anonymous (nobody) Summary: os.tempnam behavior in Windows Initial Comment: os.tempnam behaviour under windows does no agree with the documentation. Under Windows Temporary location takes precedence over specified directory, so tempnam("mydir") returns a filename on the temporary location instead of "mydir" Reason: tempnam is implemented under Windows as a call to _tempname which behaves as described above acording to MS documentation. Change: use GetTempFileName to get the desired behaviour. File Modified: Modules/posixmodule.c Error detected in: python v2.2 Corrected for Python v: 2.3 File revision: 2.271 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 From noreply@sourceforge.net Fri Nov 8 19:43:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 11:43:09 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 20:05:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 12:05:18 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Fri Nov 8 20:32:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 12:32:21 -0800 Subject: [Patches] [ python-Patches-635656 ] os.tempnam behavior in Windows Message-ID: Patches item #635656, was opened at 2002-11-08 14:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Roberto Lublinerman (rluble) >Assigned to: Tim Peters (tim_one) Summary: os.tempnam behavior in Windows Initial Comment: os.tempnam behaviour under windows does no agree with the documentation. Under Windows Temporary location takes precedence over specified directory, so tempnam("mydir") returns a filename on the temporary location instead of "mydir" Reason: tempnam is implemented under Windows as a call to _tempname which behaves as described above acording to MS documentation. Change: use GetTempFileName to get the desired behaviour. File Modified: Modules/posixmodule.c Error detected in: python v2.2 Corrected for Python v: 2.3 File revision: 2.271 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:32 Message: Logged In: YES user_id=31435 Assigned to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 From noreply@sourceforge.net Fri Nov 8 20:54:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 12:54:44 -0800 Subject: [Patches] [ python-Patches-635656 ] os.tempnam behavior in Windows Message-ID: Patches item #635656, was opened at 2002-11-08 14:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Roberto Lublinerman (rluble) >Assigned to: Fred L. Drake, Jr. (fdrake) Summary: os.tempnam behavior in Windows Initial Comment: os.tempnam behaviour under windows does no agree with the documentation. Under Windows Temporary location takes precedence over specified directory, so tempnam("mydir") returns a filename on the temporary location instead of "mydir" Reason: tempnam is implemented under Windows as a call to _tempname which behaves as described above acording to MS documentation. Change: use GetTempFileName to get the desired behaviour. File Modified: Modules/posixmodule.c Error detected in: python v2.2 Corrected for Python v: 2.3 File revision: 2.271 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:54 Message: Logged In: YES user_id=31435 Reassigned to Fred for pondering. As far as I can tell, the Windows _tempnam is trying to emulate more-or-less standard Unix tempnam: the first six man pages I found for tempnam on the web say that the envar TMPDIR takes precedence over the dir argument, if TMPDIR is writable. That's what Windows does too, except the name of the envar is TMP on Windows. If that's so, the implementation of os.tempnam is entirely unsurprising, but the Python docs need more words, to clarify that the behavior depends on the platform C library. Roberto, in no case do I expect to apply the patch: changing *behavior* here is dangerous to working code, and all signs say the function is working as intended, although not as documented. Years of reality take precedence over missing docs. If you need to force a particular directoy, see the docs for the tempfile module and its tempdir variable. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:32 Message: Logged In: YES user_id=31435 Assigned to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 From noreply@sourceforge.net Fri Nov 8 22:21:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 14:21:19 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Sat Nov 9 03:02:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 19:02:32 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Sat Nov 9 03:57:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 19:57:27 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Sat Nov 9 05:49:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Nov 2002 21:49:53 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 00:49 Message: Logged In: YES user_id=80475 Tim, which do you prefer? Rand6.diff is on the lauch pad, ready to go. random.sample(population,k) is now as lean and mean as sample(n,k); the xrange() idiom is thoroughly documented and tested; and the sample(population,k) approach is now my favorite. Still, rand5.diff is also ready to go. It documents how to convert from indices to elements. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Sat Nov 9 14:59:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Nov 2002 06:59:25 -0800 Subject: [Patches] [ python-Patches-635933 ] make some type attrs writable Message-ID: Patches item #635933, was opened at 2002-11-09 14:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Guido van Rossum (gvanrossum) Summary: make some type attrs writable Initial Comment: As per discussion on python-dev, this patch makes the following attributes of type objects writable from Python: - __name__ - __bases__ - __mro__ It also relaxes the restriction on not returning __module__ when that's been set to a non-string. This (tiny) part is a 2.2.3 candidate IMHO. It lets the following work: class C(object): pass class D(C): pass class E(object): def meth(self): print 1 d = D() D.__bases__ = (C, E) d.meth() but that's the extent of my testing so far. Needs a test and docs -- if the current behaviour is documented anywhere. Currently, if an assignment to __bases__ would change __base__, it complains (was easiest). Assigned to Guido so he sees it, but anyone else is encouraged to review it! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 From noreply@sourceforge.net Sat Nov 9 15:01:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Nov 2002 07:01:56 -0800 Subject: [Patches] [ python-Patches-635933 ] make some type attrs writable Message-ID: Patches item #635933, was opened at 2002-11-09 14:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Guido van Rossum (gvanrossum) Summary: make some type attrs writable Initial Comment: As per discussion on python-dev, this patch makes the following attributes of type objects writable from Python: - __name__ - __bases__ - __mro__ It also relaxes the restriction on not returning __module__ when that's been set to a non-string. This (tiny) part is a 2.2.3 candidate IMHO. It lets the following work: class C(object): pass class D(C): pass class E(object): def meth(self): print 1 d = D() D.__bases__ = (C, E) d.meth() but that's the extent of my testing so far. Needs a test and docs -- if the current behaviour is documented anywhere. Currently, if an assignment to __bases__ would change __base__, it complains (was easiest). Assigned to Guido so he sees it, but anyone else is encouraged to review it! ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-09 15:01 Message: Logged In: YES user_id=6656 Hmm, I misunderstood __base__. It's the base class that *leads* to the solid base, not the solid base. So an assignment to __bases__ may justifyable change it. Oops. Will try again later... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 From noreply@sourceforge.net Sat Nov 9 20:15:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Nov 2002 12:15:28 -0800 Subject: [Patches] [ python-Patches-636005 ] Filter unicode into unicode Message-ID: Patches item #636005, was opened at 2002-11-09 21:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636005&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Filter unicode into unicode Initial Comment: Currently, filter(None, "abc") gives "abc", but filter(None, u"abc") gives [u'a', u'b', u'c']. This patches corrects this, adding a Unicode specical case for filter. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636005&group_id=5470 From noreply@sourceforge.net Sat Nov 9 21:06:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Nov 2002 13:06:52 -0800 Subject: [Patches] [ python-Patches-629278 ] install lib-dynload .so files mode 555 Message-ID: Patches item #629278, was opened at 2002-10-26 19:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: J. Lewis Muir (jlmuir) Assigned to: Michael Hudson (mwh) Summary: install lib-dynload .so files mode 555 Initial Comment: This is a patch to the setup.py file to set the permissions of the installed shared libraries to have mode 555 (r xr xr x). This fixes bug #549338 "lib-dynload/*.so permissions wrong" and a duplicate bug #583206 "lib-dynload/*.so wrong permissions". The problem was that the shared libraries are installed by simply copying the tree of built shared libraries from the build directory to the installation location. This means that the permissions of the installed shared library files will be whatever the permissions were on these files in the build directory. The permissions are never set. If the shared libraries do not have the execute bit set, then on some platforms (Linux, in my case), python will be broken. For example, if one tries to import the time module, python will raise an ImportError saying "No module named time". To fix this, I've added a class PyBuildInstallLib(install_lib) which does exactly what install_lib does by invoking the super implementation of the install method, but then sets the permissions correctly for the installed shared library files. In the setup call in the main function, I pass this PyBuildInstallLib class in the cmdclass dictionary as the class that should be used for the 'install_lib' command. Another approach would be to instead modify the Makefile to set the correct file modes of the installed shared library files in the 'sharedinstall' target right after running '... setup.py install ...'. I didn't do this because it seemed other file modes were being set by other commands in distutils so it seemed appropriate to do the same. Attached is a patch against the 2.2.2 release. This I have tested on my machine (x86, Mandrake 8.0 + updates, Linux 2.4.18). I've also looked at what's in CVS and my changes can be trivially made to the setup.py that's in CVS as of Sat 2002-10-26 5pm CDT. ---------------------------------------------------------------------- >Comment By: J. Lewis Muir (jlmuir) Date: 2002-11-09 15:06 Message: Logged In: YES user_id=527708 I didn't know about the sysconfig thing; I like that much better. I tried your patch against the 2.2.2 release and it works fine. After revisiting this, I had a few more ideas for improvement. I think it would be even better if the access modes for all installed files and directories were actually deterministic whereas with our existing changes, only the modes of the shared library files are guaranteed to be set correctly. We are still left with all files (other than the shared libraries) and all directories having modes that are based upon whatever the modes were in the build tree that gets copied into the install dir. I've attached two patches: setup.py-2.2.2-jlmuir-v2.diff (against the 2.2.2 release) setup.py-HEAD-jlmuir-v2.diff (against HEAD) These patches incorporate your changes plus my new changes to make all installed file and directory modes get set correctly. I've tested the patches with their corresponding 2.2.2 release and HEAD. My new changes cause the mode of all installed files to be set to 644 unless they are shared libraries in which case they will get mode 755 and all directories will get mode 755. Note that I've tweaked the mode of shared libraries to 755 instead of 555. I only did this because it seemed more standard. However, the INSTALL_SHARED variable defined in Makefile uses mode 555 (so if you'd rather stick w/ 555, I'm fine with that). In the HEAD patch, I changed from using self.announce to log.info because self.annouce wasn't printing anything to stdout and it seems that other commands are now using log.info (e.g. clean.py). There is one issue that might not be worth worrying about, but I'll state it anyway. My changes will set the mode of the install dir and all descendant dirs in the tree. If the destination install dir already exists or has other dirs in it that already exist, the mode will be set for all of those dirs even if they were not part of the directory tree copied from the build dir. In practice, I don't think this will be an issue, but who knows. Alternatively, I could have extracted directory names from the list of installed files but if a directory was created that had no files in it, that directory would not get discovered and hence its mode would not get set. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-08 08:18 Message: Logged In: YES user_id=6656 Thanks for looking at this! This is a bit of a hack, but ne'er mind; I've been trying to think of a clean way of doing this for a while. I'd prefer to use sysconfig.get_config_vars("SO") than your hardcoded list of possible DSO extensions. Can you try the attached? (err, it's against HEAD but should apply to 2.2.2 with little difficulty). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 From noreply@sourceforge.net Sun Nov 10 02:48:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Nov 2002 18:48:11 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 21:48 Message: Logged In: YES user_id=80475 Can I commit rand6.diff and be done with this one? At one time, sample(n,k) looked better because the code was simpler, faster, and the use of xrange(n) in sample (population,k) wasn't obvious. As of rand6.diff, sample(population,k) is equally fast and simple. The use of xrange(n) is thoroughly documented and has no performance penalty. It's now faster and easier to express sample(n,k) in terms of sample(population,k) than vice-versa. Also, sample (population,k) has the friendlier interface. So, it is the one I recommend. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 00:49 Message: Logged In: YES user_id=80475 Tim, which do you prefer? Rand6.diff is on the lauch pad, ready to go. random.sample(population,k) is now as lean and mean as sample(n,k); the xrange() idiom is thoroughly documented and tested; and the sample(population,k) approach is now my favorite. Still, rand5.diff is also ready to go. It documents how to convert from indices to elements. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Sun Nov 10 11:16:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 03:16:01 -0800 Subject: [Patches] [ python-Patches-636159 ] Typo in PEP249 Message-ID: Patches item #636159, was opened at 2002-11-10 14:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636159&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Denis S. Otkidach (ods) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Typo in PEP249 Initial Comment: There is a typo in the exception inheritance layout: DatabaseError must not be subclass of InterfaceError. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636159&group_id=5470 From noreply@sourceforge.net Sun Nov 10 11:19:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 03:19:03 -0800 Subject: [Patches] [ python-Patches-636159 ] Typo in PEP249 Message-ID: Patches item #636159, was opened at 2002-11-10 14:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636159&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Denis S. Otkidach (ods) >Assigned to: M.-A. Lemburg (lemburg) Summary: Typo in PEP249 Initial Comment: There is a typo in the exception inheritance layout: DatabaseError must not be subclass of InterfaceError. ---------------------------------------------------------------------- >Comment By: Denis S. Otkidach (ods) Date: 2002-11-10 14:19 Message: Logged In: YES user_id=63454 Assign to edtior of the PEP ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636159&group_id=5470 From noreply@sourceforge.net Sun Nov 10 13:08:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 05:08:27 -0800 Subject: [Patches] [ python-Patches-629278 ] install lib-dynload .so files mode 555 Message-ID: Patches item #629278, was opened at 2002-10-27 01:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: J. Lewis Muir (jlmuir) Assigned to: Michael Hudson (mwh) Summary: install lib-dynload .so files mode 555 Initial Comment: This is a patch to the setup.py file to set the permissions of the installed shared libraries to have mode 555 (r xr xr x). This fixes bug #549338 "lib-dynload/*.so permissions wrong" and a duplicate bug #583206 "lib-dynload/*.so wrong permissions". The problem was that the shared libraries are installed by simply copying the tree of built shared libraries from the build directory to the installation location. This means that the permissions of the installed shared library files will be whatever the permissions were on these files in the build directory. The permissions are never set. If the shared libraries do not have the execute bit set, then on some platforms (Linux, in my case), python will be broken. For example, if one tries to import the time module, python will raise an ImportError saying "No module named time". To fix this, I've added a class PyBuildInstallLib(install_lib) which does exactly what install_lib does by invoking the super implementation of the install method, but then sets the permissions correctly for the installed shared library files. In the setup call in the main function, I pass this PyBuildInstallLib class in the cmdclass dictionary as the class that should be used for the 'install_lib' command. Another approach would be to instead modify the Makefile to set the correct file modes of the installed shared library files in the 'sharedinstall' target right after running '... setup.py install ...'. I didn't do this because it seemed other file modes were being set by other commands in distutils so it seemed appropriate to do the same. Attached is a patch against the 2.2.2 release. This I have tested on my machine (x86, Mandrake 8.0 + updates, Linux 2.4.18). I've also looked at what's in CVS and my changes can be trivially made to the setup.py that's in CVS as of Sat 2002-10-26 5pm CDT. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-10 13:08 Message: Logged In: YES user_id=6656 This is looking promising! Wrt 0555 vs 0755: *I* don't care, but I may not be a sufficiently hairy-chested unix admin to see why one is preferable over the other. Wrt logging: I haven't tried your patch yet, but will note that there's been a general effort to get distutils to shut up in 2.3 - so if the announcements don't appear unless you run "python setup.py -v", that's probably a good thing. I have faint concerns that chmod-ing the directories might run afoul of permissions in some settings. Perhaps I should go see what install_lib.install actually returns, but given that we've never had a problem report about directories it might be easier to drop that bit. Now, if your feeling *really* ambitious, it would be nice to have this fix as part of distutils proper, not some band-aid fix just for Python's setup.py... but don't feel obliged to look at this. ---------------------------------------------------------------------- Comment By: J. Lewis Muir (jlmuir) Date: 2002-11-09 21:06 Message: Logged In: YES user_id=527708 I didn't know about the sysconfig thing; I like that much better. I tried your patch against the 2.2.2 release and it works fine. After revisiting this, I had a few more ideas for improvement. I think it would be even better if the access modes for all installed files and directories were actually deterministic whereas with our existing changes, only the modes of the shared library files are guaranteed to be set correctly. We are still left with all files (other than the shared libraries) and all directories having modes that are based upon whatever the modes were in the build tree that gets copied into the install dir. I've attached two patches: setup.py-2.2.2-jlmuir-v2.diff (against the 2.2.2 release) setup.py-HEAD-jlmuir-v2.diff (against HEAD) These patches incorporate your changes plus my new changes to make all installed file and directory modes get set correctly. I've tested the patches with their corresponding 2.2.2 release and HEAD. My new changes cause the mode of all installed files to be set to 644 unless they are shared libraries in which case they will get mode 755 and all directories will get mode 755. Note that I've tweaked the mode of shared libraries to 755 instead of 555. I only did this because it seemed more standard. However, the INSTALL_SHARED variable defined in Makefile uses mode 555 (so if you'd rather stick w/ 555, I'm fine with that). In the HEAD patch, I changed from using self.announce to log.info because self.annouce wasn't printing anything to stdout and it seems that other commands are now using log.info (e.g. clean.py). There is one issue that might not be worth worrying about, but I'll state it anyway. My changes will set the mode of the install dir and all descendant dirs in the tree. If the destination install dir already exists or has other dirs in it that already exist, the mode will be set for all of those dirs even if they were not part of the directory tree copied from the build dir. In practice, I don't think this will be an issue, but who knows. Alternatively, I could have extracted directory names from the list of installed files but if a directory was created that had no files in it, that directory would not get discovered and hence its mode would not get set. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-08 14:18 Message: Logged In: YES user_id=6656 Thanks for looking at this! This is a bit of a hack, but ne'er mind; I've been trying to think of a clean way of doing this for a while. I'd prefer to use sysconfig.get_config_vars("SO") than your hardcoded list of possible DSO extensions. Can you try the attached? (err, it's against HEAD but should apply to 2.2.2 with little difficulty). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 From noreply@sourceforge.net Sun Nov 10 14:50:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 06:50:54 -0800 Subject: [Patches] [ python-Patches-633359 ] Patch for sre bug 610299 Message-ID: Patches item #633359, was opened at 2002-11-04 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open >Resolution: Accepted >Priority: 7 Submitted By: Greg Chapman (glchapman) >Assigned to: Fredrik Lundh (effbot) Summary: Patch for sre bug 610299 Initial Comment: Bug report 610299 points out this discrepancy: >>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXXX' >>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXX\xe9' The problem is in sre_compile.py: the call to _compile_charset near the end of _compile_info forgets to pass in the flags, so that the info charset is not compiled with re.U. (The info charset is used when searching to find the first character at which a match could start; it is not generated for patterns beginning with a repeat like '\w{1}'.) The attached patch changes this call to pass in the flags; it is against the 2.2.2 version of sre_compile.py. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-11-04 19:28 Message: Logged In: YES user_id=86307 Sorry, I though I marked the checkbox (I know I went throught the browse button to find the file). Anyway, here's the file. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-04 18:43 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633359&group_id=5470 From noreply@sourceforge.net Sun Nov 10 19:56:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 11:56:30 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Marc Recht (marc) Assigned to: Nobody/Anonymous (nobody) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Sun Nov 10 20:21:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 12:21:35 -0800 Subject: [Patches] [ python-Patches-629278 ] install lib-dynload .so files mode 555 Message-ID: Patches item #629278, was opened at 2002-10-26 19:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: J. Lewis Muir (jlmuir) Assigned to: Michael Hudson (mwh) Summary: install lib-dynload .so files mode 555 Initial Comment: This is a patch to the setup.py file to set the permissions of the installed shared libraries to have mode 555 (r xr xr x). This fixes bug #549338 "lib-dynload/*.so permissions wrong" and a duplicate bug #583206 "lib-dynload/*.so wrong permissions". The problem was that the shared libraries are installed by simply copying the tree of built shared libraries from the build directory to the installation location. This means that the permissions of the installed shared library files will be whatever the permissions were on these files in the build directory. The permissions are never set. If the shared libraries do not have the execute bit set, then on some platforms (Linux, in my case), python will be broken. For example, if one tries to import the time module, python will raise an ImportError saying "No module named time". To fix this, I've added a class PyBuildInstallLib(install_lib) which does exactly what install_lib does by invoking the super implementation of the install method, but then sets the permissions correctly for the installed shared library files. In the setup call in the main function, I pass this PyBuildInstallLib class in the cmdclass dictionary as the class that should be used for the 'install_lib' command. Another approach would be to instead modify the Makefile to set the correct file modes of the installed shared library files in the 'sharedinstall' target right after running '... setup.py install ...'. I didn't do this because it seemed other file modes were being set by other commands in distutils so it seemed appropriate to do the same. Attached is a patch against the 2.2.2 release. This I have tested on my machine (x86, Mandrake 8.0 + updates, Linux 2.4.18). I've also looked at what's in CVS and my changes can be trivially made to the setup.py that's in CVS as of Sat 2002-10-26 5pm CDT. ---------------------------------------------------------------------- >Comment By: J. Lewis Muir (jlmuir) Date: 2002-11-10 14:21 Message: Logged In: YES user_id=527708 WRT logging: Argh! In that case, it may need to be changed back. :-) I just want it to be consistent w/ the rest of the install process. Right now the "copying SOURCE -> DESTINATION" messages print to stdout as setup.py is invoked from the sharedinstall Makefile target, but the "changing mode of FILENAME to MODE" messages that this patch adds do not print to stdout when using the announce method. WRT directory modes: If at all possible, I'd like to keep this in. In the testing I did, the shared lib directory (in the build directory) that gets copied has a mode controlled by the umask of the user compiling python. So if I set my umask to 077 and compile python, then the shared lib directory in the build directory will have mode 700 and will have the same mode when it gets copied into the install location. I think it is cleaner to explicitly set all file and directory modes instead of depending on the umask setting of the user. WRT putting this in distutils: I'm possibly willing to do it; I'd have to look into it. I was intentionally avoiding making any changes to distutils proper as I didn't want to break anyone else who may be using install_lib and I don't know what the policies are WRT API changes (e.g. at what version is it ok to break backward compatibility, etc). ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-10 07:08 Message: Logged In: YES user_id=6656 This is looking promising! Wrt 0555 vs 0755: *I* don't care, but I may not be a sufficiently hairy-chested unix admin to see why one is preferable over the other. Wrt logging: I haven't tried your patch yet, but will note that there's been a general effort to get distutils to shut up in 2.3 - so if the announcements don't appear unless you run "python setup.py -v", that's probably a good thing. I have faint concerns that chmod-ing the directories might run afoul of permissions in some settings. Perhaps I should go see what install_lib.install actually returns, but given that we've never had a problem report about directories it might be easier to drop that bit. Now, if your feeling *really* ambitious, it would be nice to have this fix as part of distutils proper, not some band-aid fix just for Python's setup.py... but don't feel obliged to look at this. ---------------------------------------------------------------------- Comment By: J. Lewis Muir (jlmuir) Date: 2002-11-09 15:06 Message: Logged In: YES user_id=527708 I didn't know about the sysconfig thing; I like that much better. I tried your patch against the 2.2.2 release and it works fine. After revisiting this, I had a few more ideas for improvement. I think it would be even better if the access modes for all installed files and directories were actually deterministic whereas with our existing changes, only the modes of the shared library files are guaranteed to be set correctly. We are still left with all files (other than the shared libraries) and all directories having modes that are based upon whatever the modes were in the build tree that gets copied into the install dir. I've attached two patches: setup.py-2.2.2-jlmuir-v2.diff (against the 2.2.2 release) setup.py-HEAD-jlmuir-v2.diff (against HEAD) These patches incorporate your changes plus my new changes to make all installed file and directory modes get set correctly. I've tested the patches with their corresponding 2.2.2 release and HEAD. My new changes cause the mode of all installed files to be set to 644 unless they are shared libraries in which case they will get mode 755 and all directories will get mode 755. Note that I've tweaked the mode of shared libraries to 755 instead of 555. I only did this because it seemed more standard. However, the INSTALL_SHARED variable defined in Makefile uses mode 555 (so if you'd rather stick w/ 555, I'm fine with that). In the HEAD patch, I changed from using self.announce to log.info because self.annouce wasn't printing anything to stdout and it seems that other commands are now using log.info (e.g. clean.py). There is one issue that might not be worth worrying about, but I'll state it anyway. My changes will set the mode of the install dir and all descendant dirs in the tree. If the destination install dir already exists or has other dirs in it that already exist, the mode will be set for all of those dirs even if they were not part of the directory tree copied from the build dir. In practice, I don't think this will be an issue, but who knows. Alternatively, I could have extracted directory names from the list of installed files but if a directory was created that had no files in it, that directory would not get discovered and hence its mode would not get set. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-08 08:18 Message: Logged In: YES user_id=6656 Thanks for looking at this! This is a bit of a hack, but ne'er mind; I've been trying to think of a clean way of doing this for a while. I'd prefer to use sysconfig.get_config_vars("SO") than your hardcoded list of possible DSO extensions. Can you try the attached? (err, it's against HEAD but should apply to 2.2.2 with little difficulty). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629278&group_id=5470 From noreply@sourceforge.net Sun Nov 10 21:53:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Nov 2002 13:53:19 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None >Priority: 4 Submitted By: Marc Recht (marc) >Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 09:11:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 01:11:51 -0800 Subject: [Patches] [ python-Patches-636005 ] Filter unicode into unicode Message-ID: Patches item #636005, was opened at 2002-11-09 21:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636005&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Filter unicode into unicode Initial Comment: Currently, filter(None, "abc") gives "abc", but filter(None, u"abc") gives [u'a', u'b', u'c']. This patches corrects this, adding a Unicode specical case for filter. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-11-11 10:11 Message: Logged In: YES user_id=38388 There's no patch attached to this SF item, but you're right, filter() should behave in the same way for 8-bit strings as for Unicode. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636005&group_id=5470 From noreply@sourceforge.net Mon Nov 11 14:24:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 06:24:22 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 14:39:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 06:39:18 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 14:39:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 06:39:35 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-11 09:39 Message: Logged In: YES user_id=6380 Tim, are you still hesitant? I think this is fine. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 21:48 Message: Logged In: YES user_id=80475 Can I commit rand6.diff and be done with this one? At one time, sample(n,k) looked better because the code was simpler, faster, and the use of xrange(n) in sample (population,k) wasn't obvious. As of rand6.diff, sample(population,k) is equally fast and simple. The use of xrange(n) is thoroughly documented and has no performance penalty. It's now faster and easier to express sample(n,k) in terms of sample(population,k) than vice-versa. Also, sample (population,k) has the friendlier interface. So, it is the one I recommend. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 00:49 Message: Logged In: YES user_id=80475 Tim, which do you prefer? Rand6.diff is on the lauch pad, ready to go. random.sample(population,k) is now as lean and mean as sample(n,k); the xrange() idiom is thoroughly documented and tested; and the sample(population,k) approach is now my favorite. Still, rand5.diff is also ready to go. It documents how to convert from indices to elements. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Mon Nov 11 14:45:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 06:45:29 -0800 Subject: [Patches] [ python-Patches-626548 ] Support Hangul Syllable names Message-ID: Patches item #626548, was opened at 2002-10-21 23:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=626548&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Support Hangul Syllable names Initial Comment: This patch implements section 25.2 of ISO 10646 (Character names and annotations for Hangul syllables). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:45 Message: Logged In: YES user_id=21627 I have now updated the patch to use 4-space indents, and added a NEWS entry. Any further changes needed? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-23 13:41 Message: Logged In: YES user_id=21627 As for docs: I'd add a NEWS entry only. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-23 13:31 Message: Logged In: YES user_id=21627 Can you please elaborate this position? Is it not important to follow the established and agreed style guide? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-10-23 13:02 Message: Logged In: YES user_id=38388 Apart from that the patch looks ok. Do you have some docs to go with it ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-10-23 13:00 Message: Logged In: YES user_id=38388 No, I'd rather leave things as they are w/r to indentation. Thanks. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-23 12:42 Message: Logged In: YES user_id=21627 Perhaps the entire file should be formatted to conform with PEP 7 (single-tab indents, where a tab is worth 8 spaces). Should I submit a separate patch for this reformatting? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-10-23 12:32 Message: Logged In: YES user_id=38388 One more minor nit: the indentation in the C file is 4 chars, please reindent your code accordingly. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=626548&group_id=5470 From noreply@sourceforge.net Mon Nov 11 14:48:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 06:48:54 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 15:09:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 07:09:50 -0800 Subject: [Patches] [ python-Patches-614770 ] MSVC 7.0 compiler support Message-ID: Patches item #614770, was opened at 2002-09-26 05:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Anderson (djohnanderson) Assigned to: Nobody/Anonymous (nobody) Summary: MSVC 7.0 compiler support Initial Comment: Distutils doesn't work with the current shipping version of the Microsoft compiler (7.0). I've got a patch that fixes it (context diffs of msvccompiler.py against the latest code in CVS). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 16:09 Message: Logged In: YES user_id=21627 It seems clear that the patch is unacceptable in its current form: distutils *must* use the compiler that was used to compile Python. Would you be willing to revise your patch in that direction? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-16 10:50 Message: Logged In: YES user_id=11105 It seems Martin is correct. I made the following experiment: Compile pythoncore, python and pythonw with MSVC7, and the remaining extension modules with MSVC6 (all, except bsddb and _tkinter). Now the test-suite crashes hard in test_parser. When everything is built with MSVC7, the test-suite runs fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-16 10:20 Message: Logged In: YES user_id=21627 The biggest problem is that VC.NET uses a new C library, msvcr70[d].dll. I have not fully studied all issues, but it appears that, for purposes of Python, you cannot mix extensions, atleast not in the general case. In particular, you cannot pass FILE* between both C libraries. This is particularly annoying, since MS has managed that struct _iobuf (aka FILE) has identical layout in both compilers. Nevertheless, it crashes in the following scenario: VC7:fopen VC: fputs The problem is that fputs wants to lock the file. For that, it tests whether the pointer comes from its own _iob (_file.c:_lock_file). If the pointer comes from the _iob of the other C library, it concludes that this must be a _FILEX (which it isn't), and crashes :-( So it appears that one *must* build extension modules with the Visual Studio that also has built Python. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-10-15 23:01 Message: Logged In: YES user_id=31435 Thomas, I'm just curious here: if we were to create the Windows Python distribution with VC7, would the DLLs be compatible with extensions compiled by VC6? In return for your answer to that , here's how to determine your MSVC6 service pack level: HKLM\ SOFTWARE\ Microsoft\ VisualStudio\ 6.0\ ServicePacks and look at key "latest". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-15 19:13 Message: Logged In: YES user_id=11105 I've tried out the patch, and it works for me, at least with simple extensions - both with MSVC 6 and with MSVC 7, freshly installed. I don't know the service pack level of my MSVC6, Help- >About doesn't show it. I'm not sure if using 7.0 (if available) should be the default. Python itself is still built with MSVC6 AFAIK. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-08 17:51 Message: Logged In: YES user_id=618290 Thanks Martin for taking the time to review my proposed changes. I think your comments about looking in Software\Microsoft\VisualStudio instead of Software\Microsoft\DevStudio is a good point. I decided to two look only in VisualStudio because that works for both the version 6 and 7 compiler I tested with. I don't know if the version 5 compiler also stores the version in DevStudio. Another alternative, which would of course complicate code, would be two look in both places and choose the highest version. Let me know if that's what you'd prefer and I'll upload a new patch. The problem I'm having with the version 6 compiler (latest service Pack 5) is that SOFTWARE\Microsoft\Devstudio\6.0\Build System\ doesn't exist. Instead it looks like it's been moved to SOFTWARE\Microsoft\Shared Tools\Build System\, but in that new location, SOFTWARE\Microsoft\Shared Tools\Build System\Components\Platforms\Win32 (x86)\Directories doesn't exist. This has the effect of not getting the correct include directories for builds. This also points out a serious flaw in looking at undocumented registry entries to find information for the build -- there's no guarantee that the registry information won't change even within the same version of the compiler. I don't have a good solution for this problem, but I'd rather distutils reported an error when it couldn't find the registry entries it expected -- rather than silently ignoring it as it does now. In a few places I added code to report unexpected missing registry entries, but not all. I could if you're interested add error code in all cases. Fixing the problem I'm having with the version 6 compiler seems difficult, since it seems to work for you and doesn't work for me -- apparently are registries are different. Personally I'm content with leaving the version 6 compiler broken since it isn't obvious how to fix it and it apparently works for some people and I only intend to use the version 7 or newer compilers. I added three new functions: convert_mbcs, read_key, and by far the largest: expand_macros. The first two make the code simpler, easier to read, avoid unnecessary duplications, and minimize the risk that someone would forget to deal with mbcs. It would be difficult to understand the bug fix without these two functions. My hope was that these changes would make it easier for the next person who needs to learn the code. The last, expand_macros, is necessary because the version 7 compiler introduces macros which didn't exist in previous versions of the compiler. It would be awkward to implement the macros without having adding a new function. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-08 11:21 Message: Logged In: YES user_id=21627 I'm asking because you are not looking into Software\Microsoft\Devstudio anymore to find the most recent version. Not supporting MSVC5 anymore is probably acceptable. I never noticed that support for MSVC6 is broken - it works fine for me... However, if you think you can improve that somehow, please do - please elaborate what changes solve what problems, though. It seems that a number of changes are not strictly necessary (e.g. creation of new functions), to really evaluate the patch, I have to know why you propose these changes. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-07 17:27 Message: Logged In: YES user_id=618290 It's been so long since I had a copy of MSVC 5 -- I think it became obsolete about 6 or 7 years ago. None of my changes should have any impact on the operation of MSVC 5, but of course you never know unless you try it. Also, the MSVC 6 support in distutils is currently broken -- although it finds the compiler, the code to find the include paths is totally broken. I have MSVC 6 (latest service pack) and 7 and would be willing to make both those work. Anyone who's using 5 is going to have lots of other problems to deal with besides distutils. Wouldn't surprise me if the MSVC 6 code for finding paths would differ in each service pack -- since it depends upon unsupported registry entries. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-07 13:43 Message: Logged In: YES user_id=21627 Can you report whether this patch works with MSVC 5? ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 23:38 Message: Logged In: YES user_id=618290 Opps, I guess I forgot to check that little box. Sorry about that. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 23:38 Message: Logged In: YES user_id=618290 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-26 06:30 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 From noreply@sourceforge.net Mon Nov 11 15:37:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 07:37:13 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-11 10:37 Message: Logged In: YES user_id=31435 Sorry, I have a lot on my plate, and this one overshot its budget by an hour already. I'll get back to it later today. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-11 09:39 Message: Logged In: YES user_id=6380 Tim, are you still hesitant? I think this is fine. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 21:48 Message: Logged In: YES user_id=80475 Can I commit rand6.diff and be done with this one? At one time, sample(n,k) looked better because the code was simpler, faster, and the use of xrange(n) in sample (population,k) wasn't obvious. As of rand6.diff, sample(population,k) is equally fast and simple. The use of xrange(n) is thoroughly documented and has no performance penalty. It's now faster and easier to express sample(n,k) in terms of sample(population,k) than vice-versa. Also, sample (population,k) has the friendlier interface. So, it is the one I recommend. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 00:49 Message: Logged In: YES user_id=80475 Tim, which do you prefer? Rand6.diff is on the lauch pad, ready to go. random.sample(population,k) is now as lean and mean as sample(n,k); the xrange() idiom is thoroughly documented and tested; and the sample(population,k) approach is now my favorite. Still, rand5.diff is also ready to go. It documents how to convert from indices to elements. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Mon Nov 11 16:19:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 08:19:45 -0800 Subject: [Patches] [ python-Patches-614770 ] MSVC 7.0 compiler support Message-ID: Patches item #614770, was opened at 2002-09-25 20:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Anderson (djohnanderson) Assigned to: Nobody/Anonymous (nobody) Summary: MSVC 7.0 compiler support Initial Comment: Distutils doesn't work with the current shipping version of the Microsoft compiler (7.0). I've got a patch that fixes it (context diffs of msvccompiler.py against the latest code in CVS). ---------------------------------------------------------------------- >Comment By: John Anderson (djohnanderson) Date: 2002-11-11 08:19 Message: Logged In: YES user_id=618290 I'm a bit confused by the question. On my computer, I installed the current Microsoft compiler (7.0), compiled Python, installed distutils with my patch, and built several extensions. So distutils did use the same compiler as was used to build Python. If you want to use the same compiler that someone else built Python with, then you'll need to get the same compiler they used, to make sure it's the only version that's installed, and distutils works fine. Perhaps you mean that there is some way to determine which compiler was used by Python from examining something in Python itself. If this is possible, please let me know how to do it. In this case I'd propose modifying the patch to write out of warning when a different compiler was used to run distutils than was used to compile Python. In the case of the Microsoft compiler not all versions have been incompatible. Or perhaps you mean distutils should have a generalized way to specify which version of compiler to use when more than one is installed on your computer. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 07:09 Message: Logged In: YES user_id=21627 It seems clear that the patch is unacceptable in its current form: distutils *must* use the compiler that was used to compile Python. Would you be willing to revise your patch in that direction? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-16 01:50 Message: Logged In: YES user_id=11105 It seems Martin is correct. I made the following experiment: Compile pythoncore, python and pythonw with MSVC7, and the remaining extension modules with MSVC6 (all, except bsddb and _tkinter). Now the test-suite crashes hard in test_parser. When everything is built with MSVC7, the test-suite runs fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-16 01:20 Message: Logged In: YES user_id=21627 The biggest problem is that VC.NET uses a new C library, msvcr70[d].dll. I have not fully studied all issues, but it appears that, for purposes of Python, you cannot mix extensions, atleast not in the general case. In particular, you cannot pass FILE* between both C libraries. This is particularly annoying, since MS has managed that struct _iobuf (aka FILE) has identical layout in both compilers. Nevertheless, it crashes in the following scenario: VC7:fopen VC: fputs The problem is that fputs wants to lock the file. For that, it tests whether the pointer comes from its own _iob (_file.c:_lock_file). If the pointer comes from the _iob of the other C library, it concludes that this must be a _FILEX (which it isn't), and crashes :-( So it appears that one *must* build extension modules with the Visual Studio that also has built Python. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-10-15 14:01 Message: Logged In: YES user_id=31435 Thomas, I'm just curious here: if we were to create the Windows Python distribution with VC7, would the DLLs be compatible with extensions compiled by VC6? In return for your answer to that , here's how to determine your MSVC6 service pack level: HKLM\ SOFTWARE\ Microsoft\ VisualStudio\ 6.0\ ServicePacks and look at key "latest". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-15 10:13 Message: Logged In: YES user_id=11105 I've tried out the patch, and it works for me, at least with simple extensions - both with MSVC 6 and with MSVC 7, freshly installed. I don't know the service pack level of my MSVC6, Help- >About doesn't show it. I'm not sure if using 7.0 (if available) should be the default. Python itself is still built with MSVC6 AFAIK. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-08 08:51 Message: Logged In: YES user_id=618290 Thanks Martin for taking the time to review my proposed changes. I think your comments about looking in Software\Microsoft\VisualStudio instead of Software\Microsoft\DevStudio is a good point. I decided to two look only in VisualStudio because that works for both the version 6 and 7 compiler I tested with. I don't know if the version 5 compiler also stores the version in DevStudio. Another alternative, which would of course complicate code, would be two look in both places and choose the highest version. Let me know if that's what you'd prefer and I'll upload a new patch. The problem I'm having with the version 6 compiler (latest service Pack 5) is that SOFTWARE\Microsoft\Devstudio\6.0\Build System\ doesn't exist. Instead it looks like it's been moved to SOFTWARE\Microsoft\Shared Tools\Build System\, but in that new location, SOFTWARE\Microsoft\Shared Tools\Build System\Components\Platforms\Win32 (x86)\Directories doesn't exist. This has the effect of not getting the correct include directories for builds. This also points out a serious flaw in looking at undocumented registry entries to find information for the build -- there's no guarantee that the registry information won't change even within the same version of the compiler. I don't have a good solution for this problem, but I'd rather distutils reported an error when it couldn't find the registry entries it expected -- rather than silently ignoring it as it does now. In a few places I added code to report unexpected missing registry entries, but not all. I could if you're interested add error code in all cases. Fixing the problem I'm having with the version 6 compiler seems difficult, since it seems to work for you and doesn't work for me -- apparently are registries are different. Personally I'm content with leaving the version 6 compiler broken since it isn't obvious how to fix it and it apparently works for some people and I only intend to use the version 7 or newer compilers. I added three new functions: convert_mbcs, read_key, and by far the largest: expand_macros. The first two make the code simpler, easier to read, avoid unnecessary duplications, and minimize the risk that someone would forget to deal with mbcs. It would be difficult to understand the bug fix without these two functions. My hope was that these changes would make it easier for the next person who needs to learn the code. The last, expand_macros, is necessary because the version 7 compiler introduces macros which didn't exist in previous versions of the compiler. It would be awkward to implement the macros without having adding a new function. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-08 02:21 Message: Logged In: YES user_id=21627 I'm asking because you are not looking into Software\Microsoft\Devstudio anymore to find the most recent version. Not supporting MSVC5 anymore is probably acceptable. I never noticed that support for MSVC6 is broken - it works fine for me... However, if you think you can improve that somehow, please do - please elaborate what changes solve what problems, though. It seems that a number of changes are not strictly necessary (e.g. creation of new functions), to really evaluate the patch, I have to know why you propose these changes. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-07 08:27 Message: Logged In: YES user_id=618290 It's been so long since I had a copy of MSVC 5 -- I think it became obsolete about 6 or 7 years ago. None of my changes should have any impact on the operation of MSVC 5, but of course you never know unless you try it. Also, the MSVC 6 support in distutils is currently broken -- although it finds the compiler, the code to find the include paths is totally broken. I have MSVC 6 (latest service pack) and 7 and would be willing to make both those work. Anyone who's using 5 is going to have lots of other problems to deal with besides distutils. Wouldn't surprise me if the MSVC 6 code for finding paths would differ in each service pack -- since it depends upon unsupported registry entries. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-07 04:43 Message: Logged In: YES user_id=21627 Can you report whether this patch works with MSVC 5? ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 14:38 Message: Logged In: YES user_id=618290 Opps, I guess I forgot to check that little box. Sorry about that. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 14:38 Message: Logged In: YES user_id=618290 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-25 21:30 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 From noreply@sourceforge.net Mon Nov 11 17:07:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 09:07:38 -0800 Subject: [Patches] [ python-Patches-614770 ] MSVC 7.0 compiler support Message-ID: Patches item #614770, was opened at 2002-09-26 05:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Anderson (djohnanderson) Assigned to: Nobody/Anonymous (nobody) Summary: MSVC 7.0 compiler support Initial Comment: Distutils doesn't work with the current shipping version of the Microsoft compiler (7.0). I've got a patch that fixes it (context diffs of msvccompiler.py against the latest code in CVS). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 18:07 Message: Logged In: YES user_id=21627 The typical distutils user does use a Python binary compiled by somebody else, and may have multiple MSVC versions installed, or may have the wrong version installed. I was hoping that you can find the MSVC version from sys.version, but that only tells you that it is MSC (or that it isn't). I would suggest to modify the compiler identification in sys.version to include the MSC version. Perhaps we could eliminate the identification of Win32-Alpha at the same time. IOW, the compiler string should read [MSC v.12 32 bit] or perhaps v.1200 if that makes it easier. Then, distutils should look into sys.version for the compiler revision, and assume v.1200 (i.e. MSVC6) if no version indication is found. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-11-11 17:19 Message: Logged In: YES user_id=618290 I'm a bit confused by the question. On my computer, I installed the current Microsoft compiler (7.0), compiled Python, installed distutils with my patch, and built several extensions. So distutils did use the same compiler as was used to build Python. If you want to use the same compiler that someone else built Python with, then you'll need to get the same compiler they used, to make sure it's the only version that's installed, and distutils works fine. Perhaps you mean that there is some way to determine which compiler was used by Python from examining something in Python itself. If this is possible, please let me know how to do it. In this case I'd propose modifying the patch to write out of warning when a different compiler was used to run distutils than was used to compile Python. In the case of the Microsoft compiler not all versions have been incompatible. Or perhaps you mean distutils should have a generalized way to specify which version of compiler to use when more than one is installed on your computer. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 16:09 Message: Logged In: YES user_id=21627 It seems clear that the patch is unacceptable in its current form: distutils *must* use the compiler that was used to compile Python. Would you be willing to revise your patch in that direction? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-16 10:50 Message: Logged In: YES user_id=11105 It seems Martin is correct. I made the following experiment: Compile pythoncore, python and pythonw with MSVC7, and the remaining extension modules with MSVC6 (all, except bsddb and _tkinter). Now the test-suite crashes hard in test_parser. When everything is built with MSVC7, the test-suite runs fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-16 10:20 Message: Logged In: YES user_id=21627 The biggest problem is that VC.NET uses a new C library, msvcr70[d].dll. I have not fully studied all issues, but it appears that, for purposes of Python, you cannot mix extensions, atleast not in the general case. In particular, you cannot pass FILE* between both C libraries. This is particularly annoying, since MS has managed that struct _iobuf (aka FILE) has identical layout in both compilers. Nevertheless, it crashes in the following scenario: VC7:fopen VC: fputs The problem is that fputs wants to lock the file. For that, it tests whether the pointer comes from its own _iob (_file.c:_lock_file). If the pointer comes from the _iob of the other C library, it concludes that this must be a _FILEX (which it isn't), and crashes :-( So it appears that one *must* build extension modules with the Visual Studio that also has built Python. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-10-15 23:01 Message: Logged In: YES user_id=31435 Thomas, I'm just curious here: if we were to create the Windows Python distribution with VC7, would the DLLs be compatible with extensions compiled by VC6? In return for your answer to that , here's how to determine your MSVC6 service pack level: HKLM\ SOFTWARE\ Microsoft\ VisualStudio\ 6.0\ ServicePacks and look at key "latest". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-15 19:13 Message: Logged In: YES user_id=11105 I've tried out the patch, and it works for me, at least with simple extensions - both with MSVC 6 and with MSVC 7, freshly installed. I don't know the service pack level of my MSVC6, Help- >About doesn't show it. I'm not sure if using 7.0 (if available) should be the default. Python itself is still built with MSVC6 AFAIK. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-08 17:51 Message: Logged In: YES user_id=618290 Thanks Martin for taking the time to review my proposed changes. I think your comments about looking in Software\Microsoft\VisualStudio instead of Software\Microsoft\DevStudio is a good point. I decided to two look only in VisualStudio because that works for both the version 6 and 7 compiler I tested with. I don't know if the version 5 compiler also stores the version in DevStudio. Another alternative, which would of course complicate code, would be two look in both places and choose the highest version. Let me know if that's what you'd prefer and I'll upload a new patch. The problem I'm having with the version 6 compiler (latest service Pack 5) is that SOFTWARE\Microsoft\Devstudio\6.0\Build System\ doesn't exist. Instead it looks like it's been moved to SOFTWARE\Microsoft\Shared Tools\Build System\, but in that new location, SOFTWARE\Microsoft\Shared Tools\Build System\Components\Platforms\Win32 (x86)\Directories doesn't exist. This has the effect of not getting the correct include directories for builds. This also points out a serious flaw in looking at undocumented registry entries to find information for the build -- there's no guarantee that the registry information won't change even within the same version of the compiler. I don't have a good solution for this problem, but I'd rather distutils reported an error when it couldn't find the registry entries it expected -- rather than silently ignoring it as it does now. In a few places I added code to report unexpected missing registry entries, but not all. I could if you're interested add error code in all cases. Fixing the problem I'm having with the version 6 compiler seems difficult, since it seems to work for you and doesn't work for me -- apparently are registries are different. Personally I'm content with leaving the version 6 compiler broken since it isn't obvious how to fix it and it apparently works for some people and I only intend to use the version 7 or newer compilers. I added three new functions: convert_mbcs, read_key, and by far the largest: expand_macros. The first two make the code simpler, easier to read, avoid unnecessary duplications, and minimize the risk that someone would forget to deal with mbcs. It would be difficult to understand the bug fix without these two functions. My hope was that these changes would make it easier for the next person who needs to learn the code. The last, expand_macros, is necessary because the version 7 compiler introduces macros which didn't exist in previous versions of the compiler. It would be awkward to implement the macros without having adding a new function. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-08 11:21 Message: Logged In: YES user_id=21627 I'm asking because you are not looking into Software\Microsoft\Devstudio anymore to find the most recent version. Not supporting MSVC5 anymore is probably acceptable. I never noticed that support for MSVC6 is broken - it works fine for me... However, if you think you can improve that somehow, please do - please elaborate what changes solve what problems, though. It seems that a number of changes are not strictly necessary (e.g. creation of new functions), to really evaluate the patch, I have to know why you propose these changes. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-07 17:27 Message: Logged In: YES user_id=618290 It's been so long since I had a copy of MSVC 5 -- I think it became obsolete about 6 or 7 years ago. None of my changes should have any impact on the operation of MSVC 5, but of course you never know unless you try it. Also, the MSVC 6 support in distutils is currently broken -- although it finds the compiler, the code to find the include paths is totally broken. I have MSVC 6 (latest service pack) and 7 and would be willing to make both those work. Anyone who's using 5 is going to have lots of other problems to deal with besides distutils. Wouldn't surprise me if the MSVC 6 code for finding paths would differ in each service pack -- since it depends upon unsupported registry entries. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-07 13:43 Message: Logged In: YES user_id=21627 Can you report whether this patch works with MSVC 5? ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 23:38 Message: Logged In: YES user_id=618290 Opps, I guess I forgot to check that little box. Sorry about that. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 23:38 Message: Logged In: YES user_id=618290 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-26 06:30 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 From noreply@sourceforge.net Mon Nov 11 17:24:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 09:24:39 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Marc Recht (marc) Date: 2002-11-11 18:24 Message: Logged In: YES user_id=205 rpc/clnt.h is used by the nismodule so it should be set. (And IIRC it's used more often in FreeBSD 4.x.) The problem is that if _POSIX_SOURCE or _POSIX_C_SOURCE is set __BSD_VISIBLE isn't defined. Because of that certain functions and defines which Python seems to rely on aren't defined. Another problem is that some functions like ftello are defined at a higher POSIX level than python expects. The cleanest way for FreeBSD to solve this issues for FreeBSD is to not define _POSIX_SOURCE or _POSIX_C_SOURCE. This means also _XOPEN_SOURCE can't be defined, because then _POSIX_C_SOURCE will be defined in cdefs.h. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 17:47:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 09:47:55 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 18:47 Message: Logged In: YES user_id=21627 The usage in rpc/clnt.h is irrelevant. It only wraps rpccreateerr (or some such), which is not used in Python. Since this patch is for FreeBSD -current, configuration of FreeBSD 4.x should not concern us at the moment; on the SF compile farm, Python compiles fine on FreeBSD 4.7, so I can't see a problem there, either. Can you name a few function which Python seems to rely on, along with the specific compiler error message that you get? For ftello, I would suggest to raise _XOPEN_SOURCE to 600, on all systems. Systems that only offer XPG/5 should not be affected. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 18:24 Message: Logged In: YES user_id=205 rpc/clnt.h is used by the nismodule so it should be set. (And IIRC it's used more often in FreeBSD 4.x.) The problem is that if _POSIX_SOURCE or _POSIX_C_SOURCE is set __BSD_VISIBLE isn't defined. Because of that certain functions and defines which Python seems to rely on aren't defined. Another problem is that some functions like ftello are defined at a higher POSIX level than python expects. The cleanest way for FreeBSD to solve this issues for FreeBSD is to not define _POSIX_SOURCE or _POSIX_C_SOURCE. This means also _XOPEN_SOURCE can't be defined, because then _POSIX_C_SOURCE will be defined in cdefs.h. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 18:33:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 10:33:42 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Marc Recht (marc) Date: 2002-11-11 19:33 Message: Logged In: YES user_id=205 I had a quick look at FreeBSD 4.x's includes (couldn't resist..). If _THREAD_SAFE is set then feof(p), ferror(p), clearerr(p), fileno(p) are not available there. And then a thread-safe version of _FLOCKFILE(x). fseeko, ftello, vsnprintf, ctermid_r, seteuid, setegid, setgroups, u_char, u_short, u_int, u_long, ushort, uint, .. (I've attached a build log.) The u_* typedefs are only defined if __BSD_VISIBLE is defined. Raising _XOPEN_SOURCE to 600 isn't a real option, because chroot and friends are only visible in the __BSD_VISIBLE or _XOPEN_SOURCE <= 500 case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 18:47 Message: Logged In: YES user_id=21627 The usage in rpc/clnt.h is irrelevant. It only wraps rpccreateerr (or some such), which is not used in Python. Since this patch is for FreeBSD -current, configuration of FreeBSD 4.x should not concern us at the moment; on the SF compile farm, Python compiles fine on FreeBSD 4.7, so I can't see a problem there, either. Can you name a few function which Python seems to rely on, along with the specific compiler error message that you get? For ftello, I would suggest to raise _XOPEN_SOURCE to 600, on all systems. Systems that only offer XPG/5 should not be affected. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 18:24 Message: Logged In: YES user_id=205 rpc/clnt.h is used by the nismodule so it should be set. (And IIRC it's used more often in FreeBSD 4.x.) The problem is that if _POSIX_SOURCE or _POSIX_C_SOURCE is set __BSD_VISIBLE isn't defined. Because of that certain functions and defines which Python seems to rely on aren't defined. Another problem is that some functions like ftello are defined at a higher POSIX level than python expects. The cleanest way for FreeBSD to solve this issues for FreeBSD is to not define _POSIX_SOURCE or _POSIX_C_SOURCE. This means also _XOPEN_SOURCE can't be defined, because then _POSIX_C_SOURCE will be defined in cdefs.h. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 19:38:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 11:38:22 -0800 Subject: [Patches] [ python-Patches-614770 ] MSVC 7.0 compiler support Message-ID: Patches item #614770, was opened at 2002-09-25 23:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Anderson (djohnanderson) Assigned to: Nobody/Anonymous (nobody) Summary: MSVC 7.0 compiler support Initial Comment: Distutils doesn't work with the current shipping version of the Microsoft compiler (7.0). I've got a patch that fixes it (context diffs of msvccompiler.py against the latest code in CVS). ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-11 14:38 Message: Logged In: YES user_id=31435 Martin, assuming the tests pass, I'm about to check in a change so that, under 2.3, >>> import sys >>> sys.version '2.3a0 (#29, Nov 11 2002, 14:23:17) [MSC v.1200 32 bit (Intel)] >>> under MSVC 6. Also gets rid of the _M_ALPHA code. Those are good changes regardless of the fate of the issue here, but at least poor John won't have to figure out how to trick the preprocessor into stringizing the value of _MSC_VER (that's not easy, alas). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 12:07 Message: Logged In: YES user_id=21627 The typical distutils user does use a Python binary compiled by somebody else, and may have multiple MSVC versions installed, or may have the wrong version installed. I was hoping that you can find the MSVC version from sys.version, but that only tells you that it is MSC (or that it isn't). I would suggest to modify the compiler identification in sys.version to include the MSC version. Perhaps we could eliminate the identification of Win32-Alpha at the same time. IOW, the compiler string should read [MSC v.12 32 bit] or perhaps v.1200 if that makes it easier. Then, distutils should look into sys.version for the compiler revision, and assume v.1200 (i.e. MSVC6) if no version indication is found. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-11-11 11:19 Message: Logged In: YES user_id=618290 I'm a bit confused by the question. On my computer, I installed the current Microsoft compiler (7.0), compiled Python, installed distutils with my patch, and built several extensions. So distutils did use the same compiler as was used to build Python. If you want to use the same compiler that someone else built Python with, then you'll need to get the same compiler they used, to make sure it's the only version that's installed, and distutils works fine. Perhaps you mean that there is some way to determine which compiler was used by Python from examining something in Python itself. If this is possible, please let me know how to do it. In this case I'd propose modifying the patch to write out of warning when a different compiler was used to run distutils than was used to compile Python. In the case of the Microsoft compiler not all versions have been incompatible. Or perhaps you mean distutils should have a generalized way to specify which version of compiler to use when more than one is installed on your computer. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 10:09 Message: Logged In: YES user_id=21627 It seems clear that the patch is unacceptable in its current form: distutils *must* use the compiler that was used to compile Python. Would you be willing to revise your patch in that direction? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-16 04:50 Message: Logged In: YES user_id=11105 It seems Martin is correct. I made the following experiment: Compile pythoncore, python and pythonw with MSVC7, and the remaining extension modules with MSVC6 (all, except bsddb and _tkinter). Now the test-suite crashes hard in test_parser. When everything is built with MSVC7, the test-suite runs fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-16 04:20 Message: Logged In: YES user_id=21627 The biggest problem is that VC.NET uses a new C library, msvcr70[d].dll. I have not fully studied all issues, but it appears that, for purposes of Python, you cannot mix extensions, atleast not in the general case. In particular, you cannot pass FILE* between both C libraries. This is particularly annoying, since MS has managed that struct _iobuf (aka FILE) has identical layout in both compilers. Nevertheless, it crashes in the following scenario: VC7:fopen VC: fputs The problem is that fputs wants to lock the file. For that, it tests whether the pointer comes from its own _iob (_file.c:_lock_file). If the pointer comes from the _iob of the other C library, it concludes that this must be a _FILEX (which it isn't), and crashes :-( So it appears that one *must* build extension modules with the Visual Studio that also has built Python. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-10-15 17:01 Message: Logged In: YES user_id=31435 Thomas, I'm just curious here: if we were to create the Windows Python distribution with VC7, would the DLLs be compatible with extensions compiled by VC6? In return for your answer to that , here's how to determine your MSVC6 service pack level: HKLM\ SOFTWARE\ Microsoft\ VisualStudio\ 6.0\ ServicePacks and look at key "latest". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-10-15 13:13 Message: Logged In: YES user_id=11105 I've tried out the patch, and it works for me, at least with simple extensions - both with MSVC 6 and with MSVC 7, freshly installed. I don't know the service pack level of my MSVC6, Help- >About doesn't show it. I'm not sure if using 7.0 (if available) should be the default. Python itself is still built with MSVC6 AFAIK. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-08 11:51 Message: Logged In: YES user_id=618290 Thanks Martin for taking the time to review my proposed changes. I think your comments about looking in Software\Microsoft\VisualStudio instead of Software\Microsoft\DevStudio is a good point. I decided to two look only in VisualStudio because that works for both the version 6 and 7 compiler I tested with. I don't know if the version 5 compiler also stores the version in DevStudio. Another alternative, which would of course complicate code, would be two look in both places and choose the highest version. Let me know if that's what you'd prefer and I'll upload a new patch. The problem I'm having with the version 6 compiler (latest service Pack 5) is that SOFTWARE\Microsoft\Devstudio\6.0\Build System\ doesn't exist. Instead it looks like it's been moved to SOFTWARE\Microsoft\Shared Tools\Build System\, but in that new location, SOFTWARE\Microsoft\Shared Tools\Build System\Components\Platforms\Win32 (x86)\Directories doesn't exist. This has the effect of not getting the correct include directories for builds. This also points out a serious flaw in looking at undocumented registry entries to find information for the build -- there's no guarantee that the registry information won't change even within the same version of the compiler. I don't have a good solution for this problem, but I'd rather distutils reported an error when it couldn't find the registry entries it expected -- rather than silently ignoring it as it does now. In a few places I added code to report unexpected missing registry entries, but not all. I could if you're interested add error code in all cases. Fixing the problem I'm having with the version 6 compiler seems difficult, since it seems to work for you and doesn't work for me -- apparently are registries are different. Personally I'm content with leaving the version 6 compiler broken since it isn't obvious how to fix it and it apparently works for some people and I only intend to use the version 7 or newer compilers. I added three new functions: convert_mbcs, read_key, and by far the largest: expand_macros. The first two make the code simpler, easier to read, avoid unnecessary duplications, and minimize the risk that someone would forget to deal with mbcs. It would be difficult to understand the bug fix without these two functions. My hope was that these changes would make it easier for the next person who needs to learn the code. The last, expand_macros, is necessary because the version 7 compiler introduces macros which didn't exist in previous versions of the compiler. It would be awkward to implement the macros without having adding a new function. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-08 05:21 Message: Logged In: YES user_id=21627 I'm asking because you are not looking into Software\Microsoft\Devstudio anymore to find the most recent version. Not supporting MSVC5 anymore is probably acceptable. I never noticed that support for MSVC6 is broken - it works fine for me... However, if you think you can improve that somehow, please do - please elaborate what changes solve what problems, though. It seems that a number of changes are not strictly necessary (e.g. creation of new functions), to really evaluate the patch, I have to know why you propose these changes. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-10-07 11:27 Message: Logged In: YES user_id=618290 It's been so long since I had a copy of MSVC 5 -- I think it became obsolete about 6 or 7 years ago. None of my changes should have any impact on the operation of MSVC 5, but of course you never know unless you try it. Also, the MSVC 6 support in distutils is currently broken -- although it finds the compiler, the code to find the include paths is totally broken. I have MSVC 6 (latest service pack) and 7 and would be willing to make both those work. Anyone who's using 5 is going to have lots of other problems to deal with besides distutils. Wouldn't surprise me if the MSVC 6 code for finding paths would differ in each service pack -- since it depends upon unsupported registry entries. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-07 07:43 Message: Logged In: YES user_id=21627 Can you report whether this patch works with MSVC 5? ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 17:38 Message: Logged In: YES user_id=618290 Opps, I guess I forgot to check that little box. Sorry about that. ---------------------------------------------------------------------- Comment By: John Anderson (djohnanderson) Date: 2002-09-26 17:38 Message: Logged In: YES user_id=618290 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-26 00:30 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=614770&group_id=5470 From noreply@sourceforge.net Mon Nov 11 19:41:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 11:41:49 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 20:41 Message: Logged In: YES user_id=21627 Ok, I see that we need to define _THREAD_SAFE for FreeBSD 4.x. Since this will go away, please provide a separate patch, one that checks for 4.x. For 5.x, I still don't see the need. Thanks for the build log. Apart from the XDR problems, Python compiles and works just fine even though _XOPEN_SOURCE is defined, right? These problems look like FreeBSD bugs to me: there should be a way to use XDR even if _XOPEN_SOURCE is defined. (there is also a bug with the conflicting wchar_t types, which also looks like a FreeBSD bug). If we bump _XOPEN_SOURCE to 600, your list shrinks to just ctermid_r, and setgroups, right? Absence of ctermid_r is no problem, since ctermid is already thread-safe, and ctermid_r not needed at all. Please notice that chroot does not cause a compile-time error anymore. Which friends of chroot were you talking about? In any case, please try the attached patch. This should fix this issue. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 19:33 Message: Logged In: YES user_id=205 I had a quick look at FreeBSD 4.x's includes (couldn't resist..). If _THREAD_SAFE is set then feof(p), ferror(p), clearerr(p), fileno(p) are not available there. And then a thread-safe version of _FLOCKFILE(x). fseeko, ftello, vsnprintf, ctermid_r, seteuid, setegid, setgroups, u_char, u_short, u_int, u_long, ushort, uint, .. (I've attached a build log.) The u_* typedefs are only defined if __BSD_VISIBLE is defined. Raising _XOPEN_SOURCE to 600 isn't a real option, because chroot and friends are only visible in the __BSD_VISIBLE or _XOPEN_SOURCE <= 500 case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 18:47 Message: Logged In: YES user_id=21627 The usage in rpc/clnt.h is irrelevant. It only wraps rpccreateerr (or some such), which is not used in Python. Since this patch is for FreeBSD -current, configuration of FreeBSD 4.x should not concern us at the moment; on the SF compile farm, Python compiles fine on FreeBSD 4.7, so I can't see a problem there, either. Can you name a few function which Python seems to rely on, along with the specific compiler error message that you get? For ftello, I would suggest to raise _XOPEN_SOURCE to 600, on all systems. Systems that only offer XPG/5 should not be affected. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 18:24 Message: Logged In: YES user_id=205 rpc/clnt.h is used by the nismodule so it should be set. (And IIRC it's used more often in FreeBSD 4.x.) The problem is that if _POSIX_SOURCE or _POSIX_C_SOURCE is set __BSD_VISIBLE isn't defined. Because of that certain functions and defines which Python seems to rely on aren't defined. Another problem is that some functions like ftello are defined at a higher POSIX level than python expects. The cleanest way for FreeBSD to solve this issues for FreeBSD is to not define _POSIX_SOURCE or _POSIX_C_SOURCE. This means also _XOPEN_SOURCE can't be defined, because then _POSIX_C_SOURCE will be defined in cdefs.h. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 21:35:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 13:35:48 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-11 16:35 Message: Logged In: YES user_id=31435 Accepted, and back to Raymond . rand6,diff it is. Comments: + I doubt the xrange trick will work in Python 3, but time enough for that in coming years. + It's awfully obscure why "return pool[-k:]" can't do a wrong thing when k is 0. + Vertical space isn't at a premium here -- no need to squash if and if-controlled code onto the same line. + The test code will never be run (nobody runs random.py). That's not your fault. We should think about making a real test out of random.py's quarter-attempt at testing itself. + It looks to be a pleasant new facility. Good job! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-11 10:37 Message: Logged In: YES user_id=31435 Sorry, I have a lot on my plate, and this one overshot its budget by an hour already. I'll get back to it later today. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-11 09:39 Message: Logged In: YES user_id=6380 Tim, are you still hesitant? I think this is fine. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 21:48 Message: Logged In: YES user_id=80475 Can I commit rand6.diff and be done with this one? At one time, sample(n,k) looked better because the code was simpler, faster, and the use of xrange(n) in sample (population,k) wasn't obvious. As of rand6.diff, sample(population,k) is equally fast and simple. The use of xrange(n) is thoroughly documented and has no performance penalty. It's now faster and easier to express sample(n,k) in terms of sample(population,k) than vice-versa. Also, sample (population,k) has the friendlier interface. So, it is the one I recommend. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 00:49 Message: Logged In: YES user_id=80475 Tim, which do you prefer? Rand6.diff is on the lauch pad, ready to go. random.sample(population,k) is now as lean and mean as sample(n,k); the xrange() idiom is thoroughly documented and tested; and the sample(population,k) approach is now my favorite. Still, rand5.diff is also ready to go. It documents how to convert from indices to elements. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Mon Nov 11 21:42:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 13:42:57 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Marc Recht (marc) Date: 2002-11-11 22:42 Message: Logged In: YES user_id=205 Ok, I'll prepare a patch for 4.x. Your patch works great. Thanks! I've attached a build log. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 20:41 Message: Logged In: YES user_id=21627 Ok, I see that we need to define _THREAD_SAFE for FreeBSD 4.x. Since this will go away, please provide a separate patch, one that checks for 4.x. For 5.x, I still don't see the need. Thanks for the build log. Apart from the XDR problems, Python compiles and works just fine even though _XOPEN_SOURCE is defined, right? These problems look like FreeBSD bugs to me: there should be a way to use XDR even if _XOPEN_SOURCE is defined. (there is also a bug with the conflicting wchar_t types, which also looks like a FreeBSD bug). If we bump _XOPEN_SOURCE to 600, your list shrinks to just ctermid_r, and setgroups, right? Absence of ctermid_r is no problem, since ctermid is already thread-safe, and ctermid_r not needed at all. Please notice that chroot does not cause a compile-time error anymore. Which friends of chroot were you talking about? In any case, please try the attached patch. This should fix this issue. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 19:33 Message: Logged In: YES user_id=205 I had a quick look at FreeBSD 4.x's includes (couldn't resist..). If _THREAD_SAFE is set then feof(p), ferror(p), clearerr(p), fileno(p) are not available there. And then a thread-safe version of _FLOCKFILE(x). fseeko, ftello, vsnprintf, ctermid_r, seteuid, setegid, setgroups, u_char, u_short, u_int, u_long, ushort, uint, .. (I've attached a build log.) The u_* typedefs are only defined if __BSD_VISIBLE is defined. Raising _XOPEN_SOURCE to 600 isn't a real option, because chroot and friends are only visible in the __BSD_VISIBLE or _XOPEN_SOURCE <= 500 case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 18:47 Message: Logged In: YES user_id=21627 The usage in rpc/clnt.h is irrelevant. It only wraps rpccreateerr (or some such), which is not used in Python. Since this patch is for FreeBSD -current, configuration of FreeBSD 4.x should not concern us at the moment; on the SF compile farm, Python compiles fine on FreeBSD 4.7, so I can't see a problem there, either. Can you name a few function which Python seems to rely on, along with the specific compiler error message that you get? For ftello, I would suggest to raise _XOPEN_SOURCE to 600, on all systems. Systems that only offer XPG/5 should not be affected. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 18:24 Message: Logged In: YES user_id=205 rpc/clnt.h is used by the nismodule so it should be set. (And IIRC it's used more often in FreeBSD 4.x.) The problem is that if _POSIX_SOURCE or _POSIX_C_SOURCE is set __BSD_VISIBLE isn't defined. Because of that certain functions and defines which Python seems to rely on aren't defined. Another problem is that some functions like ftello are defined at a higher POSIX level than python expects. The cleanest way for FreeBSD to solve this issues for FreeBSD is to not define _POSIX_SOURCE or _POSIX_C_SOURCE. This means also _XOPEN_SOURCE can't be defined, because then _POSIX_C_SOURCE will be defined in cdefs.h. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Mon Nov 11 21:51:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 13:51:18 -0800 Subject: [Patches] [ python-Patches-636769 ] Fix for major rexec bugs Message-ID: Patches item #636769, was opened at 2002-11-11 21:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636769&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 8 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Martin v. Löwis (loewis) Summary: Fix for major rexec bugs Initial Comment: This patch fixes many flavours of the same major problem: class S(str): def __eq__(self, obj): return 1 >>> file("/tmp/foo", S("w")) >>> __import__(S("dl")) >>> import os >>> os.__name__ = S("dl") >>> reload(os) Additionally, it removes the self.f reference of "FileWrapper", includes 'xreadlines' and '__iter__' in FileBase.ok_file_methods, and includes 'xreadlines' and '_weakref' in RExec.ok_builtin_modules. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636769&group_id=5470 From noreply@sourceforge.net Tue Nov 12 06:05:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Nov 2002 22:05:08 -0800 Subject: [Patches] [ python-Patches-636318 ] Build fixes for FreeBSD 5.0 (-current) Message-ID: Patches item #636318, was opened at 2002-11-10 20:56 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 Category: Build Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 4 Submitted By: Marc Recht (marc) Assigned to: Martin v. Löwis (loewis) Summary: Build fixes for FreeBSD 5.0 (-current) Initial Comment: The fixes the building problems on FreeBSD 5.0 (-current). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-12 07:05 Message: Logged In: YES user_id=21627 Committed as configure 1.358 configure.in 1.369 pyconfig.h.in 1.61 ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 22:42 Message: Logged In: YES user_id=205 Ok, I'll prepare a patch for 4.x. Your patch works great. Thanks! I've attached a build log. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 20:41 Message: Logged In: YES user_id=21627 Ok, I see that we need to define _THREAD_SAFE for FreeBSD 4.x. Since this will go away, please provide a separate patch, one that checks for 4.x. For 5.x, I still don't see the need. Thanks for the build log. Apart from the XDR problems, Python compiles and works just fine even though _XOPEN_SOURCE is defined, right? These problems look like FreeBSD bugs to me: there should be a way to use XDR even if _XOPEN_SOURCE is defined. (there is also a bug with the conflicting wchar_t types, which also looks like a FreeBSD bug). If we bump _XOPEN_SOURCE to 600, your list shrinks to just ctermid_r, and setgroups, right? Absence of ctermid_r is no problem, since ctermid is already thread-safe, and ctermid_r not needed at all. Please notice that chroot does not cause a compile-time error anymore. Which friends of chroot were you talking about? In any case, please try the attached patch. This should fix this issue. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 19:33 Message: Logged In: YES user_id=205 I had a quick look at FreeBSD 4.x's includes (couldn't resist..). If _THREAD_SAFE is set then feof(p), ferror(p), clearerr(p), fileno(p) are not available there. And then a thread-safe version of _FLOCKFILE(x). fseeko, ftello, vsnprintf, ctermid_r, seteuid, setegid, setgroups, u_char, u_short, u_int, u_long, ushort, uint, .. (I've attached a build log.) The u_* typedefs are only defined if __BSD_VISIBLE is defined. Raising _XOPEN_SOURCE to 600 isn't a real option, because chroot and friends are only visible in the __BSD_VISIBLE or _XOPEN_SOURCE <= 500 case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 18:47 Message: Logged In: YES user_id=21627 The usage in rpc/clnt.h is irrelevant. It only wraps rpccreateerr (or some such), which is not used in Python. Since this patch is for FreeBSD -current, configuration of FreeBSD 4.x should not concern us at the moment; on the SF compile farm, Python compiles fine on FreeBSD 4.7, so I can't see a problem there, either. Can you name a few function which Python seems to rely on, along with the specific compiler error message that you get? For ftello, I would suggest to raise _XOPEN_SOURCE to 600, on all systems. Systems that only offer XPG/5 should not be affected. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 18:24 Message: Logged In: YES user_id=205 rpc/clnt.h is used by the nismodule so it should be set. (And IIRC it's used more often in FreeBSD 4.x.) The problem is that if _POSIX_SOURCE or _POSIX_C_SOURCE is set __BSD_VISIBLE isn't defined. Because of that certain functions and defines which Python seems to rely on aren't defined. Another problem is that some functions like ftello are defined at a higher POSIX level than python expects. The cleanest way for FreeBSD to solve this issues for FreeBSD is to not define _POSIX_SOURCE or _POSIX_C_SOURCE. This means also _XOPEN_SOURCE can't be defined, because then _POSIX_C_SOURCE will be defined in cdefs.h. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:48 Message: Logged In: YES user_id=21627 Can you please explain *why* _THREAD_SAFE is needed for threaded programs on FreeBSD? Looking at the -CURRENT sources, I find but a single occurrence of _THREAD_SAFE (in rpc/clnt.h), which is not relevant to Python. As for the build problems we talked on python-dev: Can you please re-iterate what those problems are, if you take the current Python CVS as a starting point? ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-11 15:39 Message: Logged In: YES user_id=205 The patches contain two parts. A work-around for the FreeBSD 5.0-current build problems, we talked/talking about at python-dev@. The second part is the addition of _THREAD_SAFE to the CFLAGS, if Python is build with threads. _THREAD_SAFE is needed for threaded programs on FreeBSD. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-11 15:24 Message: Logged In: YES user_id=21627 Can you please list the problems that this patch fixes? They might be fixed in the current CVS. In particular, what is the effect of defining _THREAD_SAFE? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-10 22:53 Message: Logged In: YES user_id=21627 In this form, the patch likely won't be accepted. I see it is not urgent, since the system it applies to has not been released, yet. So I would like to resolve #635034 first, and would propose that the _XOPEN_SOURCE issue then integrates with the framework established there. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636318&group_id=5470 From noreply@sourceforge.net Tue Nov 12 13:25:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 05:25:26 -0800 Subject: [Patches] [ python-Patches-479615 ] Fast-path for interned string compares Message-ID: Patches item #479615, was opened at 2001-11-08 15:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: M.-A. Lemburg (lemburg) Assigned to: M.-A. Lemburg (lemburg) Summary: Fast-path for interned string compares Initial Comment: This patch adds a fast-path for comparing equality of interned strings. The patch boosts performance for comparing identical string objects by some 20% on my machine while not causing any noticable slow-down for other operations (according to tests done with pybench). More infos and benchmarks later... ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-11-12 13:25 Message: Logged In: YES user_id=4771 It seems to me that the whole status of interned strings is not clear from the user's perspective. Maybe we should avoid putting more emphasis on it. Deprecating intern() in favor of sys.intern() would even look like a good thing to do. Besides, in the use case you describe, you can compare tokens with "is" instead of "==" as you know for sure that you are comparing two explicitely interned strings. That's a hack, but calling intern() in the first place already looks like a hack. I'd vote against it, but if the patch is accepted don't forget to change the constants EQ, LE,... into PyCmp_EQ, PyCmp_LE,... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-08-08 21:16 Message: Logged In: YES user_id=38388 I still consider the patch worth adding. The application space where it helps may be small, but also important: it can massively speed up parsers which use interned strings as tokens. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-08 21:07 Message: Logged In: YES user_id=21627 Is there any progress on this patch, or should it be considered withdrawn? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 23:35 Message: Logged In: YES user_id=35752 Attached is an updated version of this patch. I'm -0 on it since it doesn't seem to help much except for artificial benchmarks. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-11-08 15:26 Message: Logged In: YES user_id=38388 Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython): PYBENCH 1.0 Benchmark: eqpython.bench (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 125.55 ms 0.98 us -1.68% BuiltinMethodLookup: 180.10 ms 0.34 us +1.75% CompareFloats: 107.30 ms 0.24 us +2.04% CompareFloatsIntegers: 185.15 ms 0.41 us -0.05% CompareIntegers: 163.50 ms 0.18 us -1.77% CompareInternedStrings: 79.50 ms 0.16 us -20.78% ^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^ CompareLongs: 110.25 ms 0.24 us +0.09% CompareStrings: 143.40 ms 0.29 us +2.14% CompareUnicode: 118.00 ms 0.31 us +1.68% ConcatStrings: 189.55 ms 1.26 us -1.61% ConcatUnicode: 226.55 ms 1.51 us +1.34% CreateInstances: 202.35 ms 4.82 us -1.87% CreateStringsWithConcat: 221.00 ms 1.11 us +0.45% CreateUnicodeWithConcat: 240.00 ms 1.20 us +1.27% DictCreation: 213.25 ms 1.42 us +0.47% DictWithFloatKeys: 263.50 ms 0.44 us +1.15% DictWithIntegerKeys: 158.50 ms 0.26 us -1.86% DictWithStringKeys: 147.60 ms 0.25 us +0.75% ForLoops: 144.90 ms 14.49 us -4.64% IfThenElse: 174.15 ms 0.26 us -0.00% ListSlicing: 88.80 ms 25.37 us -1.11% NestedForLoops: 136.95 ms 0.39 us +3.01% NormalClassAttribute: 177.80 ms 0.30 us -2.68% NormalInstanceAttribute: 166.85 ms 0.28 us -0.54% PythonFunctionCalls: 152.20 ms 0.92 us +1.40% PythonMethodCalls: 133.70 ms 1.78 us +1.60% Recursion: 119.45 ms 9.56 us +0.04% SecondImport: 124.65 ms 4.99 us -6.03% SecondPackageImport: 130.70 ms 5.23 us -5.73% SecondSubmoduleImport: 161.65 ms 6.47 us -5.88% SimpleComplexArithmetic: 245.50 ms 1.12 us +2.08% SimpleDictManipulation: 108.50 ms 0.36 us +0.05% SimpleFloatArithmetic: 125.80 ms 0.23 us +0.84% SimpleIntFloatArithmetic: 128.50 ms 0.19 us -1.46% SimpleIntegerArithmetic: 128.45 ms 0.19 us -0.77% SimpleListManipulation: 159.15 ms 0.59 us -5.32% SimpleLongArithmetic: 189.55 ms 1.15 us +2.65% SmallLists: 293.70 ms 1.15 us -5.26% SmallTuples: 230.00 ms 0.96 us +0.44% SpecialClassAttribute: 175.70 ms 0.29 us -2.79% SpecialInstanceAttribute: 199.70 ms 0.33 us -1.55% StringMappings: 196.85 ms 1.56 us -2.48% StringPredicates: 133.00 ms 0.48 us -8.28% StringSlicing: 165.45 ms 0.95 us -3.47% TryExcept: 193.60 ms 0.13 us +0.57% TryRaiseExcept: 175.40 ms 11.69 us +0.69% TupleSlicing: 156.85 ms 1.49 us -0.00% UnicodeMappings: 175.90 ms 9.77 us +1.76% UnicodePredicates: 141.35 ms 0.63 us +0.78% UnicodeProperties: 184.35 ms 0.92 us -2.10% UnicodeSlicing: 179.45 ms 1.03 us -1.10% ------------------------------------------------------------------------ Average round time: 9855.00 ms -1.13% *) measured against: stdpython.bench (rounds=10, warp=20) As you can see, the rest of the results don't change much and the ones that do indicate some additional benefit gained by the patch. All slow-downs are way below the noise limit of around 5-10% (depending the platforms/machine/compiler). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 From noreply@sourceforge.net Tue Nov 12 13:26:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 05:26:00 -0800 Subject: [Patches] [ python-Patches-510695 ] cycle profiler for VM opcodes Message-ID: Patches item #510695, was opened at 2002-01-30 13:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=510695&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 1 Submitted By: Jeremy Hylton (jhylton) Assigned to: Jeremy Hylton (jhylton) Summary: cycle profiler for VM opcodes Initial Comment: This is just some code I'm noodling around with. It counts the number of cycles each Python VM opcode takes to execute, using the Pentium timestamp counter. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-11-12 13:26 Message: Logged In: YES user_id=4771 I think that you should try and convince people that it is a generally useful information to have, and that the fact that it is highly non-portable does not hurt. Right now it looks more like a change that a core developer would temporarily want to do to tune the VM. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-10-09 15:16 Message: Logged In: YES user_id=31392 sure ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-07 21:12 Message: Logged In: YES user_id=21627 Is this still of relevance? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=510695&group_id=5470 From noreply@sourceforge.net Tue Nov 12 13:27:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 05:27:10 -0800 Subject: [Patches] [ python-Patches-504714 ] hasattr catches only AttributeError Message-ID: Patches item #504714, was opened at 2002-01-17 02:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 Category: Core (C code) Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Quinn Dunkan (quinn_dunkan) Assigned to: Nobody/Anonymous (nobody) Summary: hasattr catches only AttributeError Initial Comment: Curse me for a fool. I reported this exact same thing in getattr but failed to look 30 lines down to notice hasattr. hasattr(foo, 'bar') catches all exceptions. I think it should only catch AttributeError. Example: >>> class Foo: ... def __getattr__(self, attr): ... assert 0 ... >>> f = Foo() >>> hasattr(f, 'bar') 0 # should have gotten an AssertionError >>> This patch makes hasattr only catch AttributeError. I changed the docstring to reflect that, and also changed the getattr docstring to read a little more naturally. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-11-12 13:27 Message: Logged In: YES user_id=4771 This looks like a possibly worthwhile semantic change. In order to help this patch progress I would recommend you to raise the question in python-dev. The library documentation should also be patched to reflect the change. ---------------------------------------------------------------------- Comment By: Quinn Dunkan (quinn_dunkan) Date: 2002-03-16 08:55 Message: Logged In: YES user_id=429749 That's true, but the current behavior can mask bugs unexpectedly. For example, if you ask someone if the brakes are engaged, and they discover that the brakes have crumbled to dust and fallen off, you probably want a different answer than "no". :) getattr() (now) only catches AttributeErrors, so there's a consistency thing too. Anyway, it's your call :) ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2002-03-16 08:24 Message: Logged In: YES user_id=92689 (The patch seems to be reversed.) The patch otherwise looks fine to me, but it will break code that depends on the current behavior. It can be argued that if getattr() raises *any* error, the attr doesn't exist, so the current behavior is in fact correct. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 From noreply@sourceforge.net Tue Nov 12 15:32:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 07:32:16 -0800 Subject: [Patches] [ python-Patches-637176 ] list.sort crasher Message-ID: Patches item #637176, was opened at 2002-11-12 15:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Armin Rigo (arigo) Assigned to: Nobody/Anonymous (nobody) Summary: list.sort crasher Initial Comment: Solves the list.sort() crash of http://www.python.org/sf/453523. Removes the immutable list trick. Makes the list empty during sort. Raises ValueError if the (temporarily empty) list is detected to have been modified at the end of the sort. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 From noreply@sourceforge.net Tue Nov 12 16:41:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 08:41:27 -0800 Subject: [Patches] [ python-Patches-633633 ] Cleanup of test_strptime.py Message-ID: Patches item #633633, was opened at 2002-11-04 23:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: Cleanup of test_strptime.py Initial Comment: I finally got around to cleaning up test_strptime.py . Basically all I did was break all the lines that went over 80 characters (although there a few that go over by a char or two). I also removed the __version__ variable. Who ever applies this patch wishes to you can go ahead and also remove the __version__ variable for _strptime.py ; it's a relic and not needed let alone updated since I never remember to. And yes, the testing suite still runs and passes all the tests. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-11-12 11:41 Message: Logged In: YES user_id=12800 I'll take this one. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 From noreply@sourceforge.net Tue Nov 12 17:52:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 09:52:46 -0800 Subject: [Patches] [ python-Patches-629637 ] Add a sample selection method to random.py Message-ID: Patches item #629637, was opened at 2002-10-27 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) Summary: Add a sample selection method to random.py Initial Comment: random.randset(n, k) returns a k length list of unique integers in the range [0,n). Improves on a Cookbook submission by using the parameters to select between a shuffle algorithm and a dictionary algorithm. I want to add this to the library because it is a simple, robust solution to a general selection problem and because it isn't obvious that two different algorithms are needed to balance speed/space trade-offs. If approved, will add docs and a news item. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-12 12:52 Message: Logged In: YES user_id=80475 Committed as random.py 1.36 and librandom.tex 1.31. Thanks Tim, Martin, and GvR for the reviews. Tim' comments: + If xrange disappears in Py3, objects defining __getitem__ will live on. + When k is zero, the block with "return pool[:k]" is never executed; the dictionary block runs instead. + Expanded single line ifs into multi-line + The tests in random.py get run only when we are maintaining the module. When I put in the Mersenne Twister, will make separate unittests that get run all the time and that test the module thoroughly. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-11 16:35 Message: Logged In: YES user_id=31435 Accepted, and back to Raymond . rand6,diff it is. Comments: + I doubt the xrange trick will work in Python 3, but time enough for that in coming years. + It's awfully obscure why "return pool[-k:]" can't do a wrong thing when k is 0. + Vertical space isn't at a premium here -- no need to squash if and if-controlled code onto the same line. + The test code will never be run (nobody runs random.py). That's not your fault. We should think about making a real test out of random.py's quarter-attempt at testing itself. + It looks to be a pleasant new facility. Good job! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-11 10:37 Message: Logged In: YES user_id=31435 Sorry, I have a lot on my plate, and this one overshot its budget by an hour already. I'll get back to it later today. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-11 09:39 Message: Logged In: YES user_id=6380 Tim, are you still hesitant? I think this is fine. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 21:48 Message: Logged In: YES user_id=80475 Can I commit rand6.diff and be done with this one? At one time, sample(n,k) looked better because the code was simpler, faster, and the use of xrange(n) in sample (population,k) wasn't obvious. As of rand6.diff, sample(population,k) is equally fast and simple. The use of xrange(n) is thoroughly documented and has no performance penalty. It's now faster and easier to express sample(n,k) in terms of sample(population,k) than vice-versa. Also, sample (population,k) has the friendlier interface. So, it is the one I recommend. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-09 00:49 Message: Logged In: YES user_id=80475 Tim, which do you prefer? Rand6.diff is on the lauch pad, ready to go. random.sample(population,k) is now as lean and mean as sample(n,k); the xrange() idiom is thoroughly documented and tested; and the sample(population,k) approach is now my favorite. Still, rand5.diff is also ready to go. It documents how to convert from indices to elements. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 22:57 Message: Logged In: YES user_id=6380 But I thought Tim recommends sample(n, k)? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 22:02 Message: Logged In: YES user_id=80475 Neatened-up the patch for random.sample(population,k). Sped the tests, eliminated the final map, and clarified the docs. Using xrange(n) as an argument is shown in both the docs and docstring so that people won't have to be clever or original. I think this one is ready for prime time and would be a happy fellow if it got blessed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 17:21 Message: Logged In: YES user_id=80475 Done. Added revised patches for sample(population,k) and for sample(n,k). Take your pick. FYI, to interpret the generator test, the expected standard deviation for a uniform distribution is sqrt(((n**2)-1) / 12). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:05 Message: Logged In: YES user_id=31435 I'd rather you went back to the original scheme -- as a "speed-freak basic building block", sticking to implicit range(n) was clear, and nobody who wants that behavior is going to guess that passing xrange(n) might work in the new scheme. If random order is a promise of this method, than that must be documented. As is, the docs are silent about order, so any order meets the spec. If it's important that it be random, then the docs have to constrain implementations; if it's not important, you can't use it as an argument . The return type isn't documented and should be, esp. if you want to stick to the new scheme. That it always returns a list will be surprising (if I pass, e.g., a string, I *expect* a string of length k to come back; or if a tuple, a tuple of length k, etc. -- this became clear from combgen's users, and is another reason sticking to the basic building block function is better -- we put this in, and next thing is a feature request to return a sequence of the same type as the input). Comments about use case subtleties, and algorithm obscurities, belong in the docs and in code comments more than in patch comments. You surely don't want to hear this next one , but the patch appears to be missing test cases. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:43 Message: Logged In: YES user_id=80475 P.S. The code continues to use the index list internally. This leaves the original pool unmolested and allows the use of xrange(n) as an argument. By not using the population elements as dictionary keys, no assumptions need to be made about the uniqueness of the population list. A weighted population is valid: sample('red red red blue blue'.split(), 3) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 14:23 Message: Logged In: YES user_id=80475 As requested, revised patch to accept a population sequence instead of an index range. Now that xrange() is fixed (a separate issue), this patch will also serve to choose from large integer sequences without building the whole sequence first: sample(xrange (10000000), 60). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 13:20 Message: Logged In: YES user_id=31435 Guido, you may recall that you used combgen in the Mankato project (to generate random, non-overlapping 5(?)- word "fingerprints" from email msgs). There are certainly valid uses for this stuff, and good algorithms aren't easy. combgen resolved the range(n) vs sequence "dilemma" by providing both, where the former was primarily for speed freaks, and the latter was implemented via has-a of the former. Both are useful, and the former is *essential* in some cases (e.g., picking 3 out of a billion -- as Raymond says, you can't well materialize an explicit list of a billion elements first). So as a basic building block, range(n) is more useful. OTOH, users often don't see how to build what they want out of basic blocks. About random vs sorted, Raymond provided a plausible use case. Nobody brought that up when I was doing combgen, but it's another thing different apps may want done differently. Purely from an efficiency view, it's quicker not to guarantee ascending order (combgen sorts under the covers), so in that way Raymond's range(n) gimmick is even more of a speed-freak basic building block than combgen's CombGenBasic class. It's always a puzzle figuring out where things belong. combgen didn't start life doing random combinations -- it started because merely computing the number of k- combinations (of n things) *is* a frequent question (how many poker hands are there? bridge hands?), and an efficient algorithm for computing that isn't obvious either. Start from there, and it's soon apparent that there are many algorithms involving combinations, so much so that if you're working in this area, a class capturing the concept is very useful. Ideally, Python would have a package for combinatorial objects, and modules therein would tackle combinations, permutations, partitions, and possibly basic graph algorithms. combgen was meant to be a start at that, but it ended there too. So that's a mild dilemma: if we put one of these in, a small but probably growing user base will want "more of the same", and random.py isn't even arguably the right place to put any of the rest. As to how straightforward even this is, I expect this is the only patch in Python history to have 10 versions attached . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 12:06 Message: Logged In: YES user_id=6380 Tim's code is at http://mail.python.org/pipermail/python-dev/2002-August/028399.html If you really need the selection in random order, wouldn't it make more sense to apply shuffle() to the resulting list? (Applying sort() to the list if you don't want it randomized seems backwards.) I do find returing a list of indices less intuitive than a list of elements. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 09:01 Message: Logged In: YES user_id=80475 FWIW, I did try out the complement selection method for k>n/2 but found that it improved performance in some cases and worsened it in others. More importantly, it interfered with the goal of returning the selections in random order. Select 10 raffle winners, give a grand prize, 2 second prizes, 3 third prizes, and 4 fourth prizes -- the results must be in random order so that the grand prize is not biased by a non-random ordering. If everyone prefers sample(sequence, k) to sample(n,k), I will be happy to change it. If Tim wants to send me some code to study, that's cool. I always learn something from reading his code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-08 08:19 Message: Logged In: YES user_id=6380 Still, the question remains, why are all these functions so disconnected in their interface. Why does shuffle() take an optional random() function as argument? Why doesn't sample() take a list from which it returns a sample? Why isn't sample() a generator? Etc. These aren't necessarily good questions, but without trying to use these functions, I can't tell. The APIs look pretty random. Maybe the random() module is destined to be a random collection of useful statistical hacks? It already looks like that to me now. If that's the case, I'm not against adding some more, but I wish that Raymond would look at Tim's code and suggestions (e.g. complement selection for k > n/2). It does seem to me that a *random* sample falls in the same category as Tim's "generate all samples" code though, so arguably Raymond's sample() would belong in random.py even if CombGen.py were in the standard library. Also consider that many uses of random() are inspired by education -- for some reason, teachers like to teach programming using the random() function and its derivatives to write simple games (number guessing), visual effects (brownian motion) and more. random.sample() might well fit in that category. Another potential use category could be simple applied statistics, like Raymond's transaction testing. It seems that such things fill some kind of need (otherwise there wouldn't be two cookbook recipes for it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 07:37 Message: Logged In: YES user_id=80475 I use the routine for transaction testing in audit work. The random order is useful so that subslices of the result are also valid random samples. I run a sample of 60, test the first 25, if an error is found, the sample expands to 60, and if more errors are found, the transaction set does not pass the audit. The cookbook poster also needed the routine in his work and wanted it badly enough to make an excrutiating tranlation from old Fortran code from a textbook. To save bungled re-inventions of the wheel, I crafted a cleaner solution than either my quick and dirty or his translated version. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 03:59 Message: Logged In: YES user_id=31435 Well, you're in murky waters because it's a "new feature" patch rather than a bugfix, and wasn't vetted on Python- Dev or c.l.py or via PEP first, nor is it a function in wide use already, neither one that people have asked for in the "small feature requests" PEP. It appeared out of the blue, and "unsolicited"/undiscussed new features are *usually* hard sells. The alternative is boundless bloat. Python went for years without random.shuffle(), and that got added because (a) at any given moment, you were likely to find a c.l.py discussion about someone's incorrect Python code for shuffling; and, (b) how to shuffle was a very popular FAQ on the Tutor list. So the demand, and the difficulty of rolling your own, were compellingly clear at the time. In contrast, people asking how to get a random k- combination are almost conspicuous by absence, which makes the "very common use" claim hard to buy when viewed against the Python community as a whole.. The handful (an exaggeration -- I only wish there were 5 ) who egged me into writing CombGen.py at the time wanted much more than *just* that, and CombGen tried to meet all expressed desires at the time. I have to agree with Martin that people who would use this also want a lot of related stuff (I'm one of them). Some of the design decisions here remain unclear. Where CombGen went out of its way to guarantee that combinations are always delivered in "ascending" order, you seem to want to guarantee that they appear in a random order. Why? Especially since you view these as index vectors, ascending order gives the best shot at locality of reference when the user does the indirect indexing bit. People who intend to use the result as a random starting point into the lexicographic or Gray code ordering of k- combinations also need ascending order. CombGen never went into the std library because I never made an attempt to put it there: CombGen never attracted a signficant audience, and I'm not keen to push things into the library that, as far as I can tell, only a few people use. Since that's the std I hold myself to, it's also the std I'm inclined to hold others to. In the absence of being able to point to potential users from c.l.py threads, let me ask why *you* wrote it. Did you have an actual app that needed this function (and if so, what was it), or was it more of an interesting programming exercise? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-08 01:52 Message: Logged In: YES user_id=21627 Well, I agree that the patch is correct in the sense of doing what it says it does. What I cannot judge is whether the feature is useful; it looks like bloat to me. I could be convinced if you find a user of this function (or the Cookbook recipe) who says I use it for this and that, and I would prefer to see it in the library for that reason, instead of copying it from the Cookbook. I have the feeling that anybody who would use such a function would also use ten other "standard" functions which are not included in the library at the moment. So that person would not be helped with getting the single function; he would need an entirely new library of such things. So I would propose that you withdraw the patch. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-08 01:36 Message: Logged In: YES user_id=80475 I'm re-learning to hate the patch process. This was a straight-forward, thoroughy tested, useful patch. Getting it accepted wasn't supposed to be hard. What is the next step -- Take it as is, convert the n argument to choice() style population list, or withdraw the patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-07 19:55 Message: Logged In: YES user_id=6380 I'm not even looking at this, I'm delegating this to Tim. He knows infinitely more about random and permutations than I do, and he's actually used this stuff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-07 18:30 Message: Logged In: YES user_id=80475 Assigned to GvR for pronouncement on a) whether he agrees that a sampling function is useful. b) whether to implement it as is or with sequence arguments c) whether to leave it in random or put in another module. The current form returns a list of integers that can be used directly or as indices into a sequence. The advantages are flexibility in use and the ability to pick a hundred elements out of ten million without building a long list first. The approach is essentially a uniquified list of calls to randrange(). Tim prefers an approach that parallels random.choice() where the call looks like: random.sample([a,b,c,d,e], 2) # picks 2 of the 5 objects I think the function belongs in the random module since it is a primary use of random numbers (just like shuffle() and choose()). Tim prefers to have a separate library module that has a whole grab bag of combinatorics. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 16:38 Message: Logged In: YES user_id=80475 Thanks for the quick follow-ups. The switchover ratio of six came from counting pointers and longs. Shuffling uses an n length list at one pointer for each element. The dictionary approach has k elements with a hash code, a key pointer, and a value pointer for a total of three multiplied by 1.5 and rounding up to five (because dict loading is kept under 2/3) and one pointer for the 'inorder' return list for a total of six. Also, I liked six to minimize resampling in the dictionary approach (keeping it under 20%). As requested, I'll add the random argument to the documentation. Originally, I was going to have sample() select from an arbitrary collection (like choose() does) but, in the end, preferred the current approach of choosing integers. This approach allows sample(1000000,60) without building a giant list first. Also, converting from indices to elements is trivial: [colorlist[i] for i in random.sample(len (colorlist),5)]. I avoided the n/2 complement selection technique because of use case rarity and to allow the sample itself to be in random order (oxymoron?). If you guys think it's necessary, I'll add a complement selection branch followed by a call to random.shuffle(). Still, as it stands, the code is robust, uses space no larger than a k sized dictionary, and runs with no more than 1.2*k calls to random(). I don't know why CombGen.py never made it to Tools/scripts. Even if it does, I think a random sampling function belongs in the random module where people can find it -- it is a very common use of random numbers. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-05 13:31 Message: Logged In: YES user_id=31435 I agree this is useful, but would rather see Python grow libraries for combinatorial objects. There are many things beyond this that are also useful, For example, the examples you gave here were of selections from collections that aren't range(n), and it would be more useful to more people to have a way to choose k elements from an arbitrary n-element collection directly (like a collection of transactions, or a set of cards, whatever). Note that I posted a module to Python-Dev not long ago that implements such stuff (CombGen.py), along with other useful functions on combinations. Note that when k > n/2, "the usual trick" isn't to shuffle a list, but to generate a complement selection. For example, if you want a random sample of 9999 out of 10000, it's a lot more efficient to pick the single element that's *not* in the result. See CombGen for code to do this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 12:34 Message: Logged In: YES user_id=21627 Thanks for the explanation. On to the implementation: How did you arrive at the factor of 6 between a dictionary and a list? The documentation should mention the random optional argument. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 10:36 Message: Logged In: YES user_id=80475 Like shuffle() and choose(), random sampling without replacement is one of the core principal use cases for random numbers. Acceptance testing often requires a fixed number of non- overlapping samples i.e. Selecting 60 transactions out of a 1000 and finding zero errors yields a 95% confidence that the population has less than a 5% error rate. Some simulations also need groups of non-overlapping samples i.e. a lottery result of six unique numbers selected from a range of 1 to 57. An electronic raffle picks consecutive winners without allowing previous winners to be reselected. While sampling with replacement is trivial to implement with a list comprehension, sampling without replacement has a number of implementation nuances that makes it worthwhile to have a robust solution already implemented in the random library. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-05 03:27 Message: Logged In: YES user_id=21627 Can you explain why this needs to be in the standard library? I.e. what typical application would use it? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-05 01:33 Message: Logged In: YES user_id=80475 Martin, do you have time to give this patch a second review? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-31 02:29 Message: Logged In: YES user_id=80475 Added new version with local variable optimization and with the dictionary results returned in selection order. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 23:42 Message: Logged In: YES user_id=80475 Renamed to random.sample(n,k) to show that it is used for sampling without replacement. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-10-28 07:54 Message: Logged In: YES user_id=80475 Added full patch with news item and docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=629637&group_id=5470 From noreply@sourceforge.net Tue Nov 12 21:44:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 13:44:50 -0800 Subject: [Patches] [ python-Patches-637176 ] list.sort crasher Message-ID: Patches item #637176, was opened at 2002-11-12 10:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Armin Rigo (arigo) >Assigned to: Tim Peters (tim_one) Summary: list.sort crasher Initial Comment: Solves the list.sort() crash of http://www.python.org/sf/453523. Removes the immutable list trick. Makes the list empty during sort. Raises ValueError if the (temporarily empty) list is detected to have been modified at the end of the sort. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-12 16:44 Message: Logged In: YES user_id=31435 Assigned to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 From noreply@sourceforge.net Tue Nov 12 22:08:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 14:08:21 -0800 Subject: [Patches] [ python-Patches-635656 ] os.tempnam behavior in Windows Message-ID: Patches item #635656, was opened at 2002-11-08 14:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 Category: Modules Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Roberto Lublinerman (rluble) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: os.tempnam behavior in Windows Initial Comment: os.tempnam behaviour under windows does no agree with the documentation. Under Windows Temporary location takes precedence over specified directory, so tempnam("mydir") returns a filename on the temporary location instead of "mydir" Reason: tempnam is implemented under Windows as a call to _tempname which behaves as described above acording to MS documentation. Change: use GetTempFileName to get the desired behaviour. File Modified: Modules/posixmodule.c Error detected in: python v2.2 Corrected for Python v: 2.3 File revision: 2.271 ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-11-12 17:08 Message: Logged In: YES user_id=3066 Documentation was fixed in Doc/lib/libos.tex revisions 1.103 and 1.74.2.1.2.8. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:54 Message: Logged In: YES user_id=31435 Reassigned to Fred for pondering. As far as I can tell, the Windows _tempnam is trying to emulate more-or-less standard Unix tempnam: the first six man pages I found for tempnam on the web say that the envar TMPDIR takes precedence over the dir argument, if TMPDIR is writable. That's what Windows does too, except the name of the envar is TMP on Windows. If that's so, the implementation of os.tempnam is entirely unsurprising, but the Python docs need more words, to clarify that the behavior depends on the platform C library. Roberto, in no case do I expect to apply the patch: changing *behavior* here is dangerous to working code, and all signs say the function is working as intended, although not as documented. Years of reality take precedence over missing docs. If you need to force a particular directoy, see the docs for the tempfile module and its tempdir variable. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-08 15:32 Message: Logged In: YES user_id=31435 Assigned to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635656&group_id=5470 From noreply@sourceforge.net Tue Nov 12 22:15:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 14:15:46 -0800 Subject: [Patches] [ python-Patches-637176 ] list.sort crasher Message-ID: Patches item #637176, was opened at 2002-11-12 10:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Armin Rigo (arigo) Assigned to: Tim Peters (tim_one) Summary: list.sort crasher Initial Comment: Solves the list.sort() crash of http://www.python.org/sf/453523. Removes the immutable list trick. Makes the list empty during sort. Raises ValueError if the (temporarily empty) list is detected to have been modified at the end of the sort. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-12 17:15 Message: Logged In: YES user_id=31435 Thanks, Armin! It's not ideal, but better than a crash for sure, and nobody has had a better idea. Doc/lib/libstdtypes.tex; new revision: 1.108 Lib/test/test_sort.py; new revision: 1.3 Lib/test/test_types.py; new revision: 1.39 Misc/NEWS; new revision: 1.520 Objects/listobject.c; new revision: 2.141 Note that I fiddled the patch to check ob_size > 0 at the end too -- because we use realloc to grow space for lists, it was possible for a comparison function to grow empty_ob_item in- place, and then the mutation wasn't caught. Ditto if a whole bunch of inserts and deletes managed to recycle memory in such a way that malloc() just happened to return the same address as empty_ob_item a second time. Those aren't hypothetical, cuz I saw them happening when writing a test case and wondering why it only caught the mutations *some* of the times. I'm still not sure it's bulletproof mutation detection, but the test case triggers every time now, so who cares . ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-12 16:44 Message: Logged In: YES user_id=31435 Assigned to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 From noreply@sourceforge.net Tue Nov 12 22:22:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 14:22:11 -0800 Subject: [Patches] [ python-Patches-613605 ] Bugfix: content-type header parsing Message-ID: Patches item #613605, was opened at 2002-09-24 02:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=613605&group_id=5470 Category: Demos and tools Group: None >Status: Closed Resolution: None Priority: 5 Submitted By: Peter Funk (pefu) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Bugfix: content-type header parsing Initial Comment: webchecker.py stumbles on content-type: text/html; charset=iso8859-1 This patch should fix it. Discovery is courtesy of Maik Jablonski, who posted an initial fix on the german zope user group mailing list (zope at dzug.org). ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-11-12 17:22 Message: Logged In: YES user_id=3066 Checked in a modified version of the patch in Tools/webchecker/webchecker.py revisions 1.29 and 1.25.6.2. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-10-09 19:43 Message: Logged In: YES user_id=33168 This makes sense to me. Fred, assign back to me if you want me to check in. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=613605&group_id=5470 From noreply@sourceforge.net Tue Nov 12 22:38:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 14:38:57 -0800 Subject: [Patches] [ python-Patches-504714 ] hasattr catches only AttributeError Message-ID: Patches item #504714, was opened at 2002-01-16 21:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 Category: Core (C code) Group: Python 2.1.2 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Quinn Dunkan (quinn_dunkan) >Assigned to: Tim Peters (tim_one) Summary: hasattr catches only AttributeError Initial Comment: Curse me for a fool. I reported this exact same thing in getattr but failed to look 30 lines down to notice hasattr. hasattr(foo, 'bar') catches all exceptions. I think it should only catch AttributeError. Example: >>> class Foo: ... def __getattr__(self, attr): ... assert 0 ... >>> f = Foo() >>> hasattr(f, 'bar') 0 # should have gotten an AssertionError >>> This patch makes hasattr only catch AttributeError. I changed the docstring to reflect that, and also changed the getattr docstring to read a little more naturally. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-12 17:38 Message: Logged In: YES user_id=31435 Sorry, I'm rejecting this, at Guido's request. I agree: far too much code would break if this were to change now; if you want hasattr()-like functionality that passes on "unexpected" exceptions, you can get it now via doing getattr(obj, attr, None); and, most fundamentally, it's part of hasattr's contract (design) that it never raises an exception. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-11-12 08:27 Message: Logged In: YES user_id=4771 This looks like a possibly worthwhile semantic change. In order to help this patch progress I would recommend you to raise the question in python-dev. The library documentation should also be patched to reflect the change. ---------------------------------------------------------------------- Comment By: Quinn Dunkan (quinn_dunkan) Date: 2002-03-16 03:55 Message: Logged In: YES user_id=429749 That's true, but the current behavior can mask bugs unexpectedly. For example, if you ask someone if the brakes are engaged, and they discover that the brakes have crumbled to dust and fallen off, you probably want a different answer than "no". :) getattr() (now) only catches AttributeErrors, so there's a consistency thing too. Anyway, it's your call :) ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2002-03-16 03:24 Message: Logged In: YES user_id=92689 (The patch seems to be reversed.) The patch otherwise looks fine to me, but it will break code that depends on the current behavior. It can be argued that if getattr() raises *any* error, the attr doesn't exist, so the current behavior is in fact correct. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 From noreply@sourceforge.net Tue Nov 12 23:10:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 15:10:44 -0800 Subject: [Patches] [ python-Patches-627900 ] Bytecode copy bug in freeze Message-ID: Patches item #627900, was opened at 2002-10-24 05:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=627900&group_id=5470 Category: Demos and tools Group: Python 2.2.x >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Troels Walsted Hansen (troels) >Assigned to: Neal Norwitz (nnorwitz) Summary: Bytecode copy bug in freeze Initial Comment: modulefinder.py in Tools/freeze fails to copy co.co_freevars and co.co_cellvars, causing mysterious crashes when the -r (replace path) option is used with freeze. Debugging credit goes to Alexander Wilkens . ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-12 18:10 Message: Logged In: YES user_id=33168 Thanks! Checked in as: * Tools/freeze/modulefinder.py 1.21 & 1.18.10.1 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=627900&group_id=5470 From noreply@sourceforge.net Wed Nov 13 00:16:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Nov 2002 16:16:49 -0800 Subject: [Patches] [ python-Patches-637176 ] list.sort crasher Message-ID: Patches item #637176, was opened at 2002-11-12 15:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 Category: Core (C code) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Armin Rigo (arigo) Assigned to: Tim Peters (tim_one) Summary: list.sort crasher Initial Comment: Solves the list.sort() crash of http://www.python.org/sf/453523. Removes the immutable list trick. Makes the list empty during sort. Raises ValueError if the (temporarily empty) list is detected to have been modified at the end of the sort. ---------------------------------------------------------------------- >Comment By: Armin Rigo (arigo) Date: 2002-11-13 00:16 Message: Logged In: YES user_id=4771 I overlooked the case you mention. I believe it is now bulletproof, because no other code in listobject.c will ever let an empty list have a non-NULL ob_item. But right, who cares :-) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-12 22:15 Message: Logged In: YES user_id=31435 Thanks, Armin! It's not ideal, but better than a crash for sure, and nobody has had a better idea. Doc/lib/libstdtypes.tex; new revision: 1.108 Lib/test/test_sort.py; new revision: 1.3 Lib/test/test_types.py; new revision: 1.39 Misc/NEWS; new revision: 1.520 Objects/listobject.c; new revision: 2.141 Note that I fiddled the patch to check ob_size > 0 at the end too -- because we use realloc to grow space for lists, it was possible for a comparison function to grow empty_ob_item in- place, and then the mutation wasn't caught. Ditto if a whole bunch of inserts and deletes managed to recycle memory in such a way that malloc() just happened to return the same address as empty_ob_item a second time. Those aren't hypothetical, cuz I saw them happening when writing a test case and wondering why it only caught the mutations *some* of the times. I'm still not sure it's bulletproof mutation detection, but the test case triggers every time now, so who cares . ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-12 21:44 Message: Logged In: YES user_id=31435 Assigned to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637176&group_id=5470 From noreply@sourceforge.net Wed Nov 13 16:16:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 08:16:37 -0800 Subject: [Patches] [ python-Patches-637835 ] Modulefinder doesn't handle PyXML Message-ID: Patches item #637835, was opened at 2002-11-13 17:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 Category: Demos and tools Group: None Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Modulefinder doesn't handle PyXML Initial Comment: The attached patch adds a mechanism to handle cases like PyXML, where a module injects itself into sys.modules under a different name. With a special version of py2exe I was able to freeze some of the PyXML test scripts, although I didn't succeed with freeze - the extension modules did not load. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 From noreply@sourceforge.net Wed Nov 13 16:23:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 08:23:06 -0800 Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation Message-ID: Patches item #578494, was opened at 2002-07-07 20:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Guido van Rossum (gvanrossum) Summary: PEP 282 Implementation Initial Comment: The attached file implements PEP282. The file logging- 0.4.6.tar.gz is the entire distribution including setup/install, test/example scripts, and TeX documentation. The file logging.py (within the .tar.gz) is all that is needed to implement the PEP. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-13 11:23 Message: Logged In: YES user_id=6380 I've checked this into the Pythonm CVS tree now. I expect to be tweaking it some before the 2.3a1 release. There's no documentation checked in yet; I hope someone will convert the docs at http://www.red-dove.com/python_logging.html to LaTeX. Vinaj, if you have any bug fixes, let me know! ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2002-10-16 05:24 Message: Logged In: YES user_id=308438 The logging.tar.gz is an in-between-releases version which I have uploaded for GvR review. It contains the logging module refactored as a package. __init__.py contains the core including FileHandler and StreamHandler; handlers.py contains all the other handlers; and config.py contains the file- based config stuff. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-08-02 17:25 Message: Logged In: YES user_id=6380 Um, Mark, looks like you accidentally closed this! I reopened it and assigned it to me for review. I'm gonna read the PEP and see if I like the design decisions enough to pronounce acceptance. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-09 22:22 Message: Logged In: YES user_id=14198 The code seems high quality and well documented. I have no concerns with logging.py as such. I have two main issues: * Design decisions: looking over python-dev, I can not see a consensus on the design decisions. I believe that *some* type of official acceptance of the design should be decreed by someone. * Source structure: while this seems quite suitable for an extension module, the format of the patch is probably not quite correct for a core module. For example, the test code should probably be integrated with the standard Python test suite (even if in a sub-directory), the Tex docs integrated with Python's docs etc So while I think the patch is high quality I believe these issues need to be addressed before I can do much more. Setting to "pending" - but good stuff tho! Please drive this through! ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2002-07-07 20:56 Message: Logged In: YES user_id=308438 Added just the logging.py file to make it easier to review. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 From noreply@sourceforge.net Wed Nov 13 18:24:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 10:24:21 -0800 Subject: [Patches] [ python-Patches-637906 ] Allow any file-like object on dis module Message-ID: Patches item #637906, was opened at 2002-11-14 03:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Allow any file-like object on dis module Initial Comment: This was useful for me to make a restricted environment by disallowing specific opcodes. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 From noreply@sourceforge.net Wed Nov 13 19:33:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 11:33:27 -0800 Subject: [Patches] [ python-Patches-590352 ] py2texi.el update Message-ID: Patches item #590352, was opened at 2002-08-02 16:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590352&group_id=5470 Category: Documentation Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: py2texi.el update Initial Comment: [python2.3 (and python2.2)] Attached is a patch from Milan Zamazal to update py2texi.el: - allow to set the info file name - correctly generate code for nodes like: \subsubsection{File Objects\obindex{file} \label{bltin-file-objects}} ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-11-13 14:33 Message: Logged In: YES user_id=3066 I've committed the revised version of this patch as: Doc/info/Makefile 1.8 Doc/tools/mkinfo 1.4 Doc/tools/py2texi.el 1.2 I'm sure there are things that need fixing, but it'll take specific reports and patches to get that done -- I just don't have the time available. Sorry for taking so long! ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-09-19 16:57 Message: Logged In: YES user_id=3066 I've attached my revised patch that adds support for some additioanal markup constructs beyond the previous version of the patch. Taken relative to the CVS HEAD. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2002-08-06 18:22 Message: Logged In: YES user_id=60903 An updated patch, which now matches Milan's version. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590352&group_id=5470 From noreply@sourceforge.net Wed Nov 13 20:07:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 12:07:50 -0800 Subject: [Patches] [ python-Patches-634866 ] general corrections to 2.2.2 refman, p.1 Message-ID: Patches item #634866, was opened at 2002-11-07 03:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634866&group_id=5470 Category: Documentation Group: Python 2.2.x >Status: Pending Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: general corrections to 2.2.2 refman, p.1 Initial Comment: as per email exchanges with F. Drake, here's a first part of suggested corrections to the 2.2.2 reference manual, mostly to make it reflect a bit better the way Python currently works. ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-11-13 15:07 Message: Logged In: YES user_id=3066 Alex, a few nits: - In the first chunk, I suspect you meant "triple-quoted string literal", not "raw string literal". - "e.g." and "i.e." should be avoided with great prejudice. I've been removing them from the rest of the documentation as I've had time. - "\C" and "\C{}" should both be replaced with just "C" whenever found. - When you refer to the "C or Java implementation", realize that the Java implementation is more deterministic; Java ints are 32 bits, period, IIRC. - There is not __iterkeys__(), only iterkeys(). I have not tried applying the patch to test formatting. Ok, it sounds like a lot of things, but they're all rather small. Your patch really helps; thanks! If you can make these changes and post an updated patch, it shouldn't take long to get it committed. I've marked the patch "pending" since I'm waiting for changes. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634866&group_id=5470 From noreply@sourceforge.net Wed Nov 13 21:22:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 13:22:09 -0800 Subject: [Patches] [ python-Patches-636159 ] Typo in PEP249 Message-ID: Patches item #636159, was opened at 2002-11-10 06:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636159&group_id=5470 Category: Documentation Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Denis S. Otkidach (ods) Assigned to: M.-A. Lemburg (lemburg) Summary: Typo in PEP249 Initial Comment: There is a typo in the exception inheritance layout: DatabaseError must not be subclass of InterfaceError. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2002-11-13 16:22 Message: Logged In: YES user_id=11375 I'm not the author of the PEP, but the bug report looks correct; fixed. Thanks for reporting this1 ---------------------------------------------------------------------- Comment By: Denis S. Otkidach (ods) Date: 2002-11-10 06:19 Message: Logged In: YES user_id=63454 Assign to edtior of the PEP ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636159&group_id=5470 From noreply@sourceforge.net Wed Nov 13 23:44:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 15:44:22 -0800 Subject: [Patches] [ python-Patches-637835 ] Modulefinder doesn't handle PyXML Message-ID: Patches item #637835, was opened at 2002-11-13 17:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 Category: Demos and tools Group: None Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Modulefinder doesn't handle PyXML Initial Comment: The attached patch adds a mechanism to handle cases like PyXML, where a module injects itself into sys.modules under a different name. With a special version of py2exe I was able to freeze some of the PyXML test scripts, although I didn't succeed with freeze - the extension modules did not load. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 00:44 Message: Logged In: YES user_id=21627 How is this supposed to be used? Do you add to replaceModuleMap only if the replacement module is present? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 From noreply@sourceforge.net Wed Nov 13 23:46:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 15:46:49 -0800 Subject: [Patches] [ python-Patches-637906 ] Allow any file-like object on dis module Message-ID: Patches item #637906, was opened at 2002-11-13 19:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Allow any file-like object on dis module Initial Comment: This was useful for me to make a restricted environment by disallowing specific opcodes. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 00:46 Message: Logged In: YES user_id=21627 Can you please elaborate? In what way does is that useful for a restricted environment? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 From noreply@sourceforge.net Thu Nov 14 01:24:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 17:24:09 -0800 Subject: [Patches] [ python-Patches-638095 ] SimpleSets Message-ID: Patches item #638095, was opened at 2002-11-13 20:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638095&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: SimpleSets Initial Comment: In the spirit of heapq for lists, this module provides set operations for dictionaries. Virtues: -- lightweight, fast, and easy to learn -- works well with any iterable input -- takes advantage of the existing implementation Vices: -- does not support sets of sets -- need to wrap sets in list() before displaying -- no operators This lightweight module (80 lines of code) is meant to be a convenient, fast solution for daily tasks which do not require the full firepower of the heavyweight Sets module. Examples -------- print 'IsSet', isset('factoid'), isset('misinformation') print 'Equals', equals('algorithm','logarithm'), equals ('sin','cos') print 'Subset', issubset('heart', 'thread'), issubset ('treat','tryst') a = 'abracadabra' b = 'alacazam' print 'Union', list(union(a,b)) print 'Intersection', list(intersection(a,b)) print 'Difference', list(difference(a,b)) print 'SymmetricDifference', list(symmetric_difference (a,b)) print 'Uniquification', list(set(a)) print 'Cardinality', len(set(a)) print 'Iteration', [letter.upper() for letter in set(a)] print 'Membership', 'b' in set(a), 'e' in set(a) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638095&group_id=5470 From noreply@sourceforge.net Thu Nov 14 07:57:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Nov 2002 23:57:56 -0800 Subject: [Patches] [ python-Patches-636769 ] Fix for major rexec bugs Message-ID: Patches item #636769, was opened at 2002-11-11 22:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636769&group_id=5470 Category: Library (Lib) Group: None Status: Open >Resolution: Accepted Priority: 8 Submitted By: Gustavo Niemeyer (niemeyer) >Assigned to: Gustavo Niemeyer (niemeyer) Summary: Fix for major rexec bugs Initial Comment: This patch fixes many flavours of the same major problem: class S(str): def __eq__(self, obj): return 1 >>> file("/tmp/foo", S("w")) >>> __import__(S("dl")) >>> import os >>> os.__name__ = S("dl") >>> reload(os) Additionally, it removes the self.f reference of "FileWrapper", includes 'xreadlines' and '__iter__' in FileBase.ok_file_methods, and includes 'xreadlines' and '_weakref' in RExec.ok_builtin_modules. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 08:57 Message: Logged In: YES user_id=21627 The patch looks fine, please apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=636769&group_id=5470 From noreply@sourceforge.net Thu Nov 14 08:50:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 00:50:14 -0800 Subject: [Patches] [ python-Patches-637835 ] Modulefinder doesn't handle PyXML Message-ID: Patches item #637835, was opened at 2002-11-13 17:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 Category: Demos and tools Group: None Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Modulefinder doesn't handle PyXML Initial Comment: The attached patch adds a mechanism to handle cases like PyXML, where a module injects itself into sys.modules under a different name. With a special version of py2exe I was able to freeze some of the PyXML test scripts, although I didn't succeed with freeze - the extension modules did not load. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2002-11-14 09:50 Message: Logged In: YES user_id=11105 I call ReplaceModule("_xmlplus", "xml") and then run ModuleFinder on my script. Only when the module named "_xmlplus" is loaded by load_package, the replaceModule hook is triggered. With the above call, it is then inserted into ModuleFinder's modules instance var under the name "xml", and not "_xmlplus". This mirrors what is done by _xmlplus/__init__.py file. Now that I think over it, probably ReplaceModule should be renamed to ReplacePackage ... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 00:44 Message: Logged In: YES user_id=21627 How is this supposed to be used? Do you add to replaceModuleMap only if the replacement module is present? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 From noreply@sourceforge.net Thu Nov 14 09:57:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 01:57:05 -0800 Subject: [Patches] [ python-Patches-638299 ] LaTeX documentation for logging package Message-ID: Patches item #638299, was opened at 2002-11-14 09:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638299&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) >Assigned to: Guido van Rossum (gvanrossum) Summary: LaTeX documentation for logging package Initial Comment: I've attached the LaTeX documentation for the logging package (PEP 282) and a minor patch (fileConfig() now takes an optional defaults dictionary parameter which is passed to ConfigParser. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638299&group_id=5470 From noreply@sourceforge.net Thu Nov 14 12:53:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 04:53:47 -0800 Subject: [Patches] [ python-Patches-638299 ] LaTeX documentation for logging package Message-ID: Patches item #638299, was opened at 2002-11-14 04:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638299&group_id=5470 >Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) >Assigned to: Skip Montanaro (montanaro) Summary: LaTeX documentation for logging package Initial Comment: I've attached the LaTeX documentation for the logging package (PEP 282) and a minor patch (fileConfig() now takes an optional defaults dictionary parameter which is passed to ConfigParser. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 07:53 Message: Logged In: YES user_id=6380 Skip, can you merge this with your work? (Or override, as you see fit?) Sorry for the duplicate work! I already took care of the config.py patch. Vinaj, are there unit tests for the logging package? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638299&group_id=5470 From noreply@sourceforge.net Thu Nov 14 13:47:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 05:47:16 -0800 Subject: [Patches] [ python-Patches-637835 ] Modulefinder doesn't handle PyXML Message-ID: Patches item #637835, was opened at 2002-11-13 17:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 Category: Demos and tools Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Thomas Heller (theller) >Assigned to: Thomas Heller (theller) Summary: Modulefinder doesn't handle PyXML Initial Comment: The attached patch adds a mechanism to handle cases like PyXML, where a module injects itself into sys.modules under a different name. With a special version of py2exe I was able to freeze some of the PyXML test scripts, although I didn't succeed with freeze - the extension modules did not load. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 14:47 Message: Logged In: YES user_id=21627 The patch is fine (with the renaming), please apply it. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-11-14 09:50 Message: Logged In: YES user_id=11105 I call ReplaceModule("_xmlplus", "xml") and then run ModuleFinder on my script. Only when the module named "_xmlplus" is loaded by load_package, the replaceModule hook is triggered. With the above call, it is then inserted into ModuleFinder's modules instance var under the name "xml", and not "_xmlplus". This mirrors what is done by _xmlplus/__init__.py file. Now that I think over it, probably ReplaceModule should be renamed to ReplacePackage ... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 00:44 Message: Logged In: YES user_id=21627 How is this supposed to be used? Do you add to replaceModuleMap only if the replacement module is present? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637835&group_id=5470 From noreply@sourceforge.net Thu Nov 14 16:56:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 08:56:43 -0800 Subject: [Patches] [ python-Patches-637906 ] Allow any file-like object on dis module Message-ID: Patches item #637906, was opened at 2002-11-13 13:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Allow any file-like object on dis module Initial Comment: This was useful for me to make a restricted environment by disallowing specific opcodes. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-14 11:56 Message: Logged In: YES user_id=31435 I'm also missing the connection to restricted environments. If you want to capture dis output (or output from anything else that uses print), the usual way to do it is to assign a StringIO instance (or other file-like object) to sys.stdout before invoking dis. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-13 18:46 Message: Logged In: YES user_id=21627 Can you please elaborate? In what way does is that useful for a restricted environment? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 From noreply@sourceforge.net Thu Nov 14 17:11:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 09:11:22 -0800 Subject: [Patches] [ python-Patches-479615 ] Fast-path for interned string compares Message-ID: Patches item #479615, was opened at 2001-11-08 10:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: M.-A. Lemburg (lemburg) Assigned to: M.-A. Lemburg (lemburg) Summary: Fast-path for interned string compares Initial Comment: This patch adds a fast-path for comparing equality of interned strings. The patch boosts performance for comparing identical string objects by some 20% on my machine while not causing any noticable slow-down for other operations (according to tests done with pybench). More infos and benchmarks later... ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-14 12:11 Message: Logged In: YES user_id=31435 Marc, as Armin suggests, why is it that you can't use "is" in your code when you know you've got this special case? I'm inclined to reject this patch. While it gives a significant speedup in the specific CompareInternedStrings benchmark, by eyeball it adds Yet Antoher test and branch to all other non special-case compares, and I'm not inclined to believe that comparing interned strings should be favored at the expense of, e.g., slowing float compares, or compares of non- interned strings, or etc etc. I have to note too that the measured 21% speedup on a benchmark that does nothing else doesn't support a claim of "massive speedups". At best it looks like a small win for a small class of apps, at the expense of smaller losses for much larger classes of apps. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-11-12 08:25 Message: Logged In: YES user_id=4771 It seems to me that the whole status of interned strings is not clear from the user's perspective. Maybe we should avoid putting more emphasis on it. Deprecating intern() in favor of sys.intern() would even look like a good thing to do. Besides, in the use case you describe, you can compare tokens with "is" instead of "==" as you know for sure that you are comparing two explicitely interned strings. That's a hack, but calling intern() in the first place already looks like a hack. I'd vote against it, but if the patch is accepted don't forget to change the constants EQ, LE,... into PyCmp_EQ, PyCmp_LE,... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-08-08 17:16 Message: Logged In: YES user_id=38388 I still consider the patch worth adding. The application space where it helps may be small, but also important: it can massively speed up parsers which use interned strings as tokens. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-08 17:07 Message: Logged In: YES user_id=21627 Is there any progress on this patch, or should it be considered withdrawn? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 18:35 Message: Logged In: YES user_id=35752 Attached is an updated version of this patch. I'm -0 on it since it doesn't seem to help much except for artificial benchmarks. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-11-08 10:26 Message: Logged In: YES user_id=38388 Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython): PYBENCH 1.0 Benchmark: eqpython.bench (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 125.55 ms 0.98 us -1.68% BuiltinMethodLookup: 180.10 ms 0.34 us +1.75% CompareFloats: 107.30 ms 0.24 us +2.04% CompareFloatsIntegers: 185.15 ms 0.41 us -0.05% CompareIntegers: 163.50 ms 0.18 us -1.77% CompareInternedStrings: 79.50 ms 0.16 us -20.78% ^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^ CompareLongs: 110.25 ms 0.24 us +0.09% CompareStrings: 143.40 ms 0.29 us +2.14% CompareUnicode: 118.00 ms 0.31 us +1.68% ConcatStrings: 189.55 ms 1.26 us -1.61% ConcatUnicode: 226.55 ms 1.51 us +1.34% CreateInstances: 202.35 ms 4.82 us -1.87% CreateStringsWithConcat: 221.00 ms 1.11 us +0.45% CreateUnicodeWithConcat: 240.00 ms 1.20 us +1.27% DictCreation: 213.25 ms 1.42 us +0.47% DictWithFloatKeys: 263.50 ms 0.44 us +1.15% DictWithIntegerKeys: 158.50 ms 0.26 us -1.86% DictWithStringKeys: 147.60 ms 0.25 us +0.75% ForLoops: 144.90 ms 14.49 us -4.64% IfThenElse: 174.15 ms 0.26 us -0.00% ListSlicing: 88.80 ms 25.37 us -1.11% NestedForLoops: 136.95 ms 0.39 us +3.01% NormalClassAttribute: 177.80 ms 0.30 us -2.68% NormalInstanceAttribute: 166.85 ms 0.28 us -0.54% PythonFunctionCalls: 152.20 ms 0.92 us +1.40% PythonMethodCalls: 133.70 ms 1.78 us +1.60% Recursion: 119.45 ms 9.56 us +0.04% SecondImport: 124.65 ms 4.99 us -6.03% SecondPackageImport: 130.70 ms 5.23 us -5.73% SecondSubmoduleImport: 161.65 ms 6.47 us -5.88% SimpleComplexArithmetic: 245.50 ms 1.12 us +2.08% SimpleDictManipulation: 108.50 ms 0.36 us +0.05% SimpleFloatArithmetic: 125.80 ms 0.23 us +0.84% SimpleIntFloatArithmetic: 128.50 ms 0.19 us -1.46% SimpleIntegerArithmetic: 128.45 ms 0.19 us -0.77% SimpleListManipulation: 159.15 ms 0.59 us -5.32% SimpleLongArithmetic: 189.55 ms 1.15 us +2.65% SmallLists: 293.70 ms 1.15 us -5.26% SmallTuples: 230.00 ms 0.96 us +0.44% SpecialClassAttribute: 175.70 ms 0.29 us -2.79% SpecialInstanceAttribute: 199.70 ms 0.33 us -1.55% StringMappings: 196.85 ms 1.56 us -2.48% StringPredicates: 133.00 ms 0.48 us -8.28% StringSlicing: 165.45 ms 0.95 us -3.47% TryExcept: 193.60 ms 0.13 us +0.57% TryRaiseExcept: 175.40 ms 11.69 us +0.69% TupleSlicing: 156.85 ms 1.49 us -0.00% UnicodeMappings: 175.90 ms 9.77 us +1.76% UnicodePredicates: 141.35 ms 0.63 us +0.78% UnicodeProperties: 184.35 ms 0.92 us -2.10% UnicodeSlicing: 179.45 ms 1.03 us -1.10% ------------------------------------------------------------------------ Average round time: 9855.00 ms -1.13% *) measured against: stdpython.bench (rounds=10, warp=20) As you can see, the rest of the results don't change much and the ones that do indicate some additional benefit gained by the patch. All slow-downs are way below the noise limit of around 5-10% (depending the platforms/machine/compiler). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 From noreply@sourceforge.net Thu Nov 14 17:17:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 09:17:14 -0800 Subject: [Patches] [ python-Patches-479615 ] Fast-path for interned string compares Message-ID: Patches item #479615, was opened at 2001-11-08 16:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: M.-A. Lemburg (lemburg) Assigned to: M.-A. Lemburg (lemburg) Summary: Fast-path for interned string compares Initial Comment: This patch adds a fast-path for comparing equality of interned strings. The patch boosts performance for comparing identical string objects by some 20% on my machine while not causing any noticable slow-down for other operations (according to tests done with pybench). More infos and benchmarks later... ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-11-14 18:17 Message: Logged In: YES user_id=38388 I could use "is" in my code (and in fact, I am currently), but consider this a hack. Anyway, the PEP 0275 has a much wider scope, so I'll close this patch as rejected in the hope that PEP 275 will make it into the core. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-14 18:11 Message: Logged In: YES user_id=31435 Marc, as Armin suggests, why is it that you can't use "is" in your code when you know you've got this special case? I'm inclined to reject this patch. While it gives a significant speedup in the specific CompareInternedStrings benchmark, by eyeball it adds Yet Antoher test and branch to all other non special-case compares, and I'm not inclined to believe that comparing interned strings should be favored at the expense of, e.g., slowing float compares, or compares of non- interned strings, or etc etc. I have to note too that the measured 21% speedup on a benchmark that does nothing else doesn't support a claim of "massive speedups". At best it looks like a small win for a small class of apps, at the expense of smaller losses for much larger classes of apps. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2002-11-12 14:25 Message: Logged In: YES user_id=4771 It seems to me that the whole status of interned strings is not clear from the user's perspective. Maybe we should avoid putting more emphasis on it. Deprecating intern() in favor of sys.intern() would even look like a good thing to do. Besides, in the use case you describe, you can compare tokens with "is" instead of "==" as you know for sure that you are comparing two explicitely interned strings. That's a hack, but calling intern() in the first place already looks like a hack. I'd vote against it, but if the patch is accepted don't forget to change the constants EQ, LE,... into PyCmp_EQ, PyCmp_LE,... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-08-08 23:16 Message: Logged In: YES user_id=38388 I still consider the patch worth adding. The application space where it helps may be small, but also important: it can massively speed up parsers which use interned strings as tokens. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-08 23:07 Message: Logged In: YES user_id=21627 Is there any progress on this patch, or should it be considered withdrawn? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 00:35 Message: Logged In: YES user_id=35752 Attached is an updated version of this patch. I'm -0 on it since it doesn't seem to help much except for artificial benchmarks. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-11-08 16:26 Message: Logged In: YES user_id=38388 Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython): PYBENCH 1.0 Benchmark: eqpython.bench (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 125.55 ms 0.98 us -1.68% BuiltinMethodLookup: 180.10 ms 0.34 us +1.75% CompareFloats: 107.30 ms 0.24 us +2.04% CompareFloatsIntegers: 185.15 ms 0.41 us -0.05% CompareIntegers: 163.50 ms 0.18 us -1.77% CompareInternedStrings: 79.50 ms 0.16 us -20.78% ^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^ CompareLongs: 110.25 ms 0.24 us +0.09% CompareStrings: 143.40 ms 0.29 us +2.14% CompareUnicode: 118.00 ms 0.31 us +1.68% ConcatStrings: 189.55 ms 1.26 us -1.61% ConcatUnicode: 226.55 ms 1.51 us +1.34% CreateInstances: 202.35 ms 4.82 us -1.87% CreateStringsWithConcat: 221.00 ms 1.11 us +0.45% CreateUnicodeWithConcat: 240.00 ms 1.20 us +1.27% DictCreation: 213.25 ms 1.42 us +0.47% DictWithFloatKeys: 263.50 ms 0.44 us +1.15% DictWithIntegerKeys: 158.50 ms 0.26 us -1.86% DictWithStringKeys: 147.60 ms 0.25 us +0.75% ForLoops: 144.90 ms 14.49 us -4.64% IfThenElse: 174.15 ms 0.26 us -0.00% ListSlicing: 88.80 ms 25.37 us -1.11% NestedForLoops: 136.95 ms 0.39 us +3.01% NormalClassAttribute: 177.80 ms 0.30 us -2.68% NormalInstanceAttribute: 166.85 ms 0.28 us -0.54% PythonFunctionCalls: 152.20 ms 0.92 us +1.40% PythonMethodCalls: 133.70 ms 1.78 us +1.60% Recursion: 119.45 ms 9.56 us +0.04% SecondImport: 124.65 ms 4.99 us -6.03% SecondPackageImport: 130.70 ms 5.23 us -5.73% SecondSubmoduleImport: 161.65 ms 6.47 us -5.88% SimpleComplexArithmetic: 245.50 ms 1.12 us +2.08% SimpleDictManipulation: 108.50 ms 0.36 us +0.05% SimpleFloatArithmetic: 125.80 ms 0.23 us +0.84% SimpleIntFloatArithmetic: 128.50 ms 0.19 us -1.46% SimpleIntegerArithmetic: 128.45 ms 0.19 us -0.77% SimpleListManipulation: 159.15 ms 0.59 us -5.32% SimpleLongArithmetic: 189.55 ms 1.15 us +2.65% SmallLists: 293.70 ms 1.15 us -5.26% SmallTuples: 230.00 ms 0.96 us +0.44% SpecialClassAttribute: 175.70 ms 0.29 us -2.79% SpecialInstanceAttribute: 199.70 ms 0.33 us -1.55% StringMappings: 196.85 ms 1.56 us -2.48% StringPredicates: 133.00 ms 0.48 us -8.28% StringSlicing: 165.45 ms 0.95 us -3.47% TryExcept: 193.60 ms 0.13 us +0.57% TryRaiseExcept: 175.40 ms 11.69 us +0.69% TupleSlicing: 156.85 ms 1.49 us -0.00% UnicodeMappings: 175.90 ms 9.77 us +1.76% UnicodePredicates: 141.35 ms 0.63 us +0.78% UnicodeProperties: 184.35 ms 0.92 us -2.10% UnicodeSlicing: 179.45 ms 1.03 us -1.10% ------------------------------------------------------------------------ Average round time: 9855.00 ms -1.13% *) measured against: stdpython.bench (rounds=10, warp=20) As you can see, the rest of the results don't change much and the ones that do indicate some additional benefit gained by the patch. All slow-downs are way below the noise limit of around 5-10% (depending the platforms/machine/compiler). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 From noreply@sourceforge.net Thu Nov 14 17:28:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 09:28:20 -0800 Subject: [Patches] [ python-Patches-635933 ] make some type attrs writable Message-ID: Patches item #635933, was opened at 2002-11-09 09:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) >Assigned to: Michael Hudson (mwh) Summary: make some type attrs writable Initial Comment: As per discussion on python-dev, this patch makes the following attributes of type objects writable from Python: - __name__ - __bases__ - __mro__ It also relaxes the restriction on not returning __module__ when that's been set to a non-string. This (tiny) part is a 2.2.3 candidate IMHO. It lets the following work: class C(object): pass class D(C): pass class E(object): def meth(self): print 1 d = D() D.__bases__ = (C, E) d.meth() but that's the extent of my testing so far. Needs a test and docs -- if the current behaviour is documented anywhere. Currently, if an assignment to __bases__ would change __base__, it complains (was easiest). Assigned to Guido so he sees it, but anyone else is encouraged to review it! ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 12:28 Message: Logged In: YES user_id=6380 Michael, can you checking some of this in as separate pieces? The __module__ relaxation should go in first, and marked as backport candidate. The __name__ fix is close, but I think it *should* be allowed to put dots in the name (this is actually a feature for old classes); instead of '.' I want a check that there are no \0 bytes in the string (see set_name() in classobject.c). I think the restrictions on __bases__ are sufficiently thought out; with old-style classes, you can do much more class switching: >>> class C: pass >>> class D: pass >>> D.__bases__ = (C,) >>> I'd like this to work for new-style classes too. It means that __base__ has to change though. There's a bug in set_mro(): it checks PyInstance_Check() where it clearly means PyClass_Check(). Other than that I think it's good to go. (Though this is the ultimate weird feature! What's the use case again?) Hoping for unit tests, ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-09 10:01 Message: Logged In: YES user_id=6656 Hmm, I misunderstood __base__. It's the base class that *leads* to the solid base, not the solid base. So an assignment to __bases__ may justifyable change it. Oops. Will try again later... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 From noreply@sourceforge.net Thu Nov 14 17:31:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 09:31:43 -0800 Subject: [Patches] [ python-Patches-597907 ] Oren Tirosh's fastnames patch Message-ID: Patches item #597907, was opened at 2002-08-20 15:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None >Priority: 1 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Oren Tirosh's fastnames patch Initial Comment: Oren Tirosh had a nice patch to *really* speed up global/builtin name lookup. I'm adding it here because I don't want to lose this idea. His code and some comments are here: http://www.tothink.com/python/fastnames/ I'm uploading a new version of this patch relative to current CVS. I'm still considering whether to do this; I measure at best a 1% speedup for pystone. For a modified version of Oren's benchmark (modified to use a function instead of a class for 'builtin' and 'global', so that these tests use LOAD_GLOBAL rather than LOAD_NAME, I get these test results (best of 3): builtin 1.38 global 1.54 local 1.28 fastlocal 0.90 Python 2.3 without his patch (but with my speedup hacks in LOAD_GLOBAL): builtin 1.80 global 1.52 local 1.77 fastlocal 0.91 Python 2.2 (from the 2.2 branch, which is newer than the 2.2.1 release but doesn't have any speedups) did this: builtin 2.28 global 1.86 local 1.80 fastlocal 1.10 I don't care about the speedup for the 'local' case, since this uses the LOAD_NAME opcode which is only used inside class definitions; the 'builtin' and 'global' cases are interesting. It looks like Oren's patch gives us a nice speedup for looking up a built-in name from a function. I have to think about why looking up a global from a function is slower though... ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 12:31 Message: Logged In: YES user_id=6380 Lowered priority until Oren uploads his long-awaited new version. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-23 12:35 Message: Logged In: YES user_id=6380 Oren, any chance that you'll submit a new version of this? ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-03 16:22 Message: Logged In: YES user_id=562624 > I'm still considering whether to do this; I measure at > best a 1% speedup for pystone. No surprising considering the fact that pystone is dominated by fastlocals (IIRC it was something like 99.7% according to my instrumented version). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-03 14:29 Message: Logged In: YES user_id=6380 OK. I'm holding my breath! :-) ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-02 15:59 Message: Logged In: YES user_id=562624 I'm working on an improved version. Stay tuned! ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-08-20 15:31 Message: Logged In: YES user_id=6380 Tim explained why the 'globals' case is faster than the 'builtins' case. I used 'x' as the global to look up rather than 'hex', and it so happens that the last three bits of hash('x') and hash('MANY') are the same -- MANY is an identifier I insert in the globals. I'll attach the test suite I used (with 'hex' instead of 'x'). Now I get these times: builtin 1.39 global 1.28 local 1.29 fastlocal 0.91 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 From noreply@sourceforge.net Thu Nov 14 17:32:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 09:32:24 -0800 Subject: [Patches] [ python-Patches-599331 ] PEP 269 Implementation Message-ID: Patches item #599331, was opened at 2002-08-23 13:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=599331&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None >Priority: 1 Submitted By: Jon Riehl (jriehl) Assigned to: Guido van Rossum (gvanrossum) Summary: PEP 269 Implementation Initial Comment: The following are files for the implementation of PEP 269. The primary changes to the core involve moving required pgen modules from the pgen only module list to the Python library module list in Makefile.pre.in. Some of the modules required better memory allocation and management and the corresponding deallocators are in the Python extension module (maybe these should be moved into the pgen modules in Parser). Initially included are two implementations. The first is a basic implementation that follows the PEP API more or less. The second (currently unfinished) implementation provides a more object oriented interface to the extension module. Please note there are some commented out modifications to setup.py as I was unable to build the extension module(s) automagically. For some reason the linker I was using (on a BSD box) complained that it couldn't find the _Py_pgen symbol, even though I verified its existence in the new Python library. Maybe it is checking against an older Python library on the system? Things to be done (as of initial submission) include resolving on a single interface, documenting the one true interface, and finishing any unimplemented routines (such as are found in the OO implementation). In the final integration, a pgenmodule.c file should be added to the Modules directory in the main distribution. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 12:32 Message: Logged In: YES user_id=6380 Lowering priority until Jon has his next version ready. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-13 00:35 Message: Logged In: YES user_id=6380 Cool. Maybe I'll get to it, maybe not. :-( ---------------------------------------------------------------------- Comment By: Jon Riehl (jriehl) Date: 2002-09-12 14:04 Message: Logged In: YES user_id=22448 Guido, as per my private message, I'll attempt to submit another patch by the end of the month, pending resumption of "work" on the 23rd. Commitment of the memory allocation patch is fine, and any future patches would be against the updated pgen code (I don't have commit permissions, so someone else will have to do this possibly.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-09 08:09 Message: Logged In: YES user_id=6380 I looked a bit, and it's very raw - the author admits that. Jon, are you still working on this? Do you have a more polished version? Maybe we can separate out the memory allocation patches and commit these already? They don't break anything. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-03 15:44 Message: Logged In: YES user_id=6380 I guess I'm going to hve to look at this to pronounce... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=599331&group_id=5470 From noreply@sourceforge.net Thu Nov 14 17:37:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 09:37:32 -0800 Subject: [Patches] [ python-Patches-624325 ] attributes for urlsplit, urlparse result Message-ID: Patches item #624325, was opened at 2002-10-16 17:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=624325&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Fred L. Drake, Jr. (fdrake) >Assigned to: Fred L. Drake, Jr. (fdrake) Summary: attributes for urlsplit, urlparse result Initial Comment: This patch to Lib/urlparse.py makes the fields of the results accessible as named attributes from the result object. The result objects are still small since they derive from tuple and have no __dict__, though there's some additional cost in construction (a temporary tuple is created and passed to tuple.__new__). ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 12:37 Message: Logged In: YES user_id=6380 I'm fine with this, but we decided to use a different approach (for the same effect): make structseq usable from Python code. That's being discussed in bug 624827. Pending that, this one's on hold. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-10-17 16:03 Message: Logged In: YES user_id=3066 Based on comments from Guido, provide a geturl() method instead of the url property, since it actually does more work than just retrieving data. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-10-17 13:09 Message: Logged In: YES user_id=3066 New version of the patch. This adds a "url" attribute to each type of result, providing the result of urlunsplit() / urlunparse() for the components of the result object. Tests and documentation have been updated. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=624325&group_id=5470 From noreply@sourceforge.net Thu Nov 14 18:07:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 10:07:47 -0800 Subject: [Patches] [ python-Patches-637906 ] Allow any file-like object on dis module Message-ID: Patches item #637906, was opened at 2002-11-14 03:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Allow any file-like object on dis module Initial Comment: This was useful for me to make a restricted environment by disallowing specific opcodes. ---------------------------------------------------------------------- >Comment By: Hye-Shik Chang (perky) Date: 2002-11-15 03:07 Message: Logged In: YES user_id=55188 Yes. To disallow BINARY_POWER and INPLACE_POWER, I'm hooking sys.stdout now. But, because it isn't thread-safe way, I needed to lock threads whenever I print something to stdout. To make a cheaper solution, I like this patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-15 01:56 Message: Logged In: YES user_id=31435 I'm also missing the connection to restricted environments. If you want to capture dis output (or output from anything else that uses print), the usual way to do it is to assign a StringIO instance (or other file-like object) to sys.stdout before invoking dis. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 08:46 Message: Logged In: YES user_id=21627 Can you please elaborate? In what way does is that useful for a restricted environment? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 From noreply@sourceforge.net Thu Nov 14 19:51:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 11:51:33 -0800 Subject: [Patches] [ python-Patches-619475 ] C3 MRO algorithm implementation Message-ID: Patches item #619475, was opened at 2002-10-06 23:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=619475&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Samuele Pedroni (pedronis) Assigned to: Guido van Rossum (gvanrossum) Summary: C3 MRO algorithm implementation Initial Comment: At least is a beginning. On Linux all tests and the modified test_descr.py pass. A few cases in test_descr.py are commented out, maybe they should be adjusted, reconstructed. For order disagreement situations: backup logic picking an element from the first non-empty list and removing it from all lists, where it appears, and just throwing a warning instead of an exception could be put where I set the exception now. Although in the long run people should learn to use consistent hiearchies anyway. PS: I was wondering how to get/reuse lst.remove(o) functionality from C, apart through PyObject_CallMethod... ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 14:51 Message: Logged In: YES user_id=6380 All checked in. Thanks, Samuele! ---------------------------------------------------------------------- Comment By: Samuele Pedroni (pedronis) Date: 2002-11-03 15:39 Message: Logged In: YES user_id=61408 FYI, I will be off-line from 4 to 16 Nov. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=619475&group_id=5470 From noreply@sourceforge.net Thu Nov 14 20:17:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 12:17:30 -0800 Subject: [Patches] [ python-Patches-637906 ] Allow any file-like object on dis module Message-ID: Patches item #637906, was opened at 2002-11-13 19:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Allow any file-like object on dis module Initial Comment: This was useful for me to make a restricted environment by disallowing specific opcodes. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 21:17 Message: Logged In: YES user_id=21627 So you want to find out whether BINARY_POWER is in the byte code? If so, I suggest that usage of dis.disassemble is inadequate. Instead, you should just copy the essential part of the disassemble loop, and look for an opcode for which dis.opname[opcode] is 'BINARY_POWER'. This will be much faster, and thread-safe. ---------------------------------------------------------------------- Comment By: Hye-Shik Chang (perky) Date: 2002-11-14 19:07 Message: Logged In: YES user_id=55188 Yes. To disallow BINARY_POWER and INPLACE_POWER, I'm hooking sys.stdout now. But, because it isn't thread-safe way, I needed to lock threads whenever I print something to stdout. To make a cheaper solution, I like this patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-14 17:56 Message: Logged In: YES user_id=31435 I'm also missing the connection to restricted environments. If you want to capture dis output (or output from anything else that uses print), the usual way to do it is to assign a StringIO instance (or other file-like object) to sys.stdout before invoking dis. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 00:46 Message: Logged In: YES user_id=21627 Can you please elaborate? In what way does is that useful for a restricted environment? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 From noreply@sourceforge.net Thu Nov 14 09:53:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 01:53:45 -0800 Subject: [Patches] [ python-Patches-638299 ] LaTeX documentation for logging package Message-ID: Patches item #638299, was opened at 2002-11-14 09:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638299&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Nobody/Anonymous (nobody) Summary: LaTeX documentation for logging package Initial Comment: I've attached the LaTeX documentation for the logging package (PEP 282) and a minor patch (fileConfig() now takes an optional defaults dictionary parameter which is passed to ConfigParser. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638299&group_id=5470 From noreply@sourceforge.net Thu Nov 14 21:21:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 13:21:06 -0800 Subject: [Patches] [ python-Patches-638669 ] proxyauth for imaplib Message-ID: Patches item #638669, was opened at 2002-11-14 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: proxyauth for imaplib Initial Comment: Added proxyauth command to allow the admin user to access to any user's mailbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 From noreply@sourceforge.net Thu Nov 14 21:27:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 13:27:13 -0800 Subject: [Patches] [ python-Patches-638673 ] Added Proxyauth command to imaplib Message-ID: Patches item #638673, was opened at 2002-11-14 16:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638673&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: Added Proxyauth command to imaplib Initial Comment: Allows the Admin user to access any other user's mailbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638673&group_id=5470 From noreply@sourceforge.net Thu Nov 14 21:28:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 13:28:53 -0800 Subject: [Patches] [ python-Patches-638669 ] proxyauth for imaplib Message-ID: Patches item #638669, was opened at 2002-11-14 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Deleted Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: proxyauth for imaplib Initial Comment: Added proxyauth command to allow the admin user to access to any user's mailbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 From noreply@sourceforge.net Thu Nov 14 21:29:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 13:29:47 -0800 Subject: [Patches] [ python-Patches-638669 ] proxyauth for imaplib Message-ID: Patches item #638669, was opened at 2002-11-14 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Deleted Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: proxyauth for imaplib Initial Comment: Added proxyauth command to allow the admin user to access to any user's mailbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 From noreply@sourceforge.net Thu Nov 14 21:30:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 13:30:32 -0800 Subject: [Patches] [ python-Patches-638669 ] proxyauth for imaplib Message-ID: Patches item #638669, was opened at 2002-11-14 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Open Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: proxyauth for imaplib Initial Comment: Added proxyauth command to allow the admin user to access to any user's mailbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 From noreply@sourceforge.net Thu Nov 14 21:30:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 13:30:50 -0800 Subject: [Patches] [ python-Patches-638669 ] proxyauth for imaplib Message-ID: Patches item #638669, was opened at 2002-11-14 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: proxyauth for imaplib Initial Comment: Added proxyauth command to allow the admin user to access to any user's mailbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638669&group_id=5470 From noreply@sourceforge.net Thu Nov 14 23:23:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 15:23:36 -0800 Subject: [Patches] [ python-Patches-638673 ] Added Proxyauth command to imaplib Message-ID: Patches item #638673, was opened at 2002-11-14 16:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638673&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Richard L. Holbert (rholbert) Assigned to: Nobody/Anonymous (nobody) Summary: Added Proxyauth command to imaplib Initial Comment: Allows the Admin user to access any other user's mailbox. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-14 18:23 Message: Logged In: YES user_id=33168 Richard, thanks for the patch. I have a couple of problems with the patch. * The patch is against 2.2, not HEAD. In this case it's easy to see where it should go, but that isn't always the case. * This patch contains tabs, all python code in the standard library should contain only spaces. * In the code that was added, name is not used. Either name should be removed, or it should be passed to _simple_command(). * There is also no documentation (Doc/lib/libimap.tex). I don't know if this patch is worthwhile, since I know nothing about imap. Someone else will have to determine if this proxyauth is generally applicable and should be included. However, are there any security issues associated with using proxyauth? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638673&group_id=5470 From noreply@sourceforge.net Thu Nov 14 23:27:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 15:27:06 -0800 Subject: [Patches] [ python-Patches-633633 ] Cleanup of test_strptime.py Message-ID: Patches item #633633, was opened at 2002-11-04 23:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Barry A. Warsaw (bwarsaw) Summary: Cleanup of test_strptime.py Initial Comment: I finally got around to cleaning up test_strptime.py . Basically all I did was break all the lines that went over 80 characters (although there a few that go over by a char or two). I also removed the __version__ variable. Who ever applies this patch wishes to you can go ahead and also remove the __version__ variable for _strptime.py ; it's a relic and not needed let alone updated since I never remember to. And yes, the testing suite still runs and passes all the tests. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 18:27 Message: Logged In: YES user_id=6380 Please go ahead. The style violations in that file have bothered me for a long time. :-( ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-11-12 11:41 Message: Logged In: YES user_id=12800 I'll take this one. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 From noreply@sourceforge.net Fri Nov 15 05:57:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 21:57:25 -0800 Subject: [Patches] [ python-Patches-597907 ] Oren Tirosh's fastnames patch Message-ID: Patches item #597907, was opened at 2002-08-21 05:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 1 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Oren Tirosh's fastnames patch Initial Comment: Oren Tirosh had a nice patch to *really* speed up global/builtin name lookup. I'm adding it here because I don't want to lose this idea. His code and some comments are here: http://www.tothink.com/python/fastnames/ I'm uploading a new version of this patch relative to current CVS. I'm still considering whether to do this; I measure at best a 1% speedup for pystone. For a modified version of Oren's benchmark (modified to use a function instead of a class for 'builtin' and 'global', so that these tests use LOAD_GLOBAL rather than LOAD_NAME, I get these test results (best of 3): builtin 1.38 global 1.54 local 1.28 fastlocal 0.90 Python 2.3 without his patch (but with my speedup hacks in LOAD_GLOBAL): builtin 1.80 global 1.52 local 1.77 fastlocal 0.91 Python 2.2 (from the 2.2 branch, which is newer than the 2.2.1 release but doesn't have any speedups) did this: builtin 2.28 global 1.86 local 1.80 fastlocal 1.10 I don't care about the speedup for the 'local' case, since this uses the LOAD_NAME opcode which is only used inside class definitions; the 'builtin' and 'global' cases are interesting. It looks like Oren's patch gives us a nice speedup for looking up a built-in name from a function. I have to think about why looking up a global from a function is slower though... ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-11-15 16:57 Message: Logged In: YES user_id=250749 I notice Oren uploaded what appears to be an updated patch (fastnames5.patch) under patch #606098. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 04:31 Message: Logged In: YES user_id=6380 Lowered priority until Oren uploads his long-awaited new version. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-24 02:35 Message: Logged In: YES user_id=6380 Oren, any chance that you'll submit a new version of this? ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-04 06:22 Message: Logged In: YES user_id=562624 > I'm still considering whether to do this; I measure at > best a 1% speedup for pystone. No surprising considering the fact that pystone is dominated by fastlocals (IIRC it was something like 99.7% according to my instrumented version). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-04 04:29 Message: Logged In: YES user_id=6380 OK. I'm holding my breath! :-) ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-03 05:59 Message: Logged In: YES user_id=562624 I'm working on an improved version. Stay tuned! ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-08-21 05:31 Message: Logged In: YES user_id=6380 Tim explained why the 'globals' case is faster than the 'builtins' case. I used 'x' as the global to look up rather than 'hex', and it so happens that the last three bits of hash('x') and hash('MANY') are the same -- MANY is an identifier I insert in the globals. I'll attach the test suite I used (with 'hex' instead of 'x'). Now I get these times: builtin 1.39 global 1.28 local 1.29 fastlocal 0.91 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 From noreply@sourceforge.net Fri Nov 15 06:01:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Nov 2002 22:01:05 -0800 Subject: [Patches] [ python-Patches-597907 ] Oren Tirosh's fastnames patch Message-ID: Patches item #597907, was opened at 2002-08-21 05:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 1 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Oren Tirosh's fastnames patch Initial Comment: Oren Tirosh had a nice patch to *really* speed up global/builtin name lookup. I'm adding it here because I don't want to lose this idea. His code and some comments are here: http://www.tothink.com/python/fastnames/ I'm uploading a new version of this patch relative to current CVS. I'm still considering whether to do this; I measure at best a 1% speedup for pystone. For a modified version of Oren's benchmark (modified to use a function instead of a class for 'builtin' and 'global', so that these tests use LOAD_GLOBAL rather than LOAD_NAME, I get these test results (best of 3): builtin 1.38 global 1.54 local 1.28 fastlocal 0.90 Python 2.3 without his patch (but with my speedup hacks in LOAD_GLOBAL): builtin 1.80 global 1.52 local 1.77 fastlocal 0.91 Python 2.2 (from the 2.2 branch, which is newer than the 2.2.1 release but doesn't have any speedups) did this: builtin 2.28 global 1.86 local 1.80 fastlocal 1.10 I don't care about the speedup for the 'local' case, since this uses the LOAD_NAME opcode which is only used inside class definitions; the 'builtin' and 'global' cases are interesting. It looks like Oren's patch gives us a nice speedup for looking up a built-in name from a function. I have to think about why looking up a global from a function is slower though... ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-11-15 17:01 Message: Logged In: YES user_id=250749 I notice Oren uploaded what appears to be an updated patch (fastnames5.patch) under patch #606098. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-11-15 16:57 Message: Logged In: YES user_id=250749 I notice Oren uploaded what appears to be an updated patch (fastnames5.patch) under patch #606098. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 04:31 Message: Logged In: YES user_id=6380 Lowered priority until Oren uploads his long-awaited new version. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-24 02:35 Message: Logged In: YES user_id=6380 Oren, any chance that you'll submit a new version of this? ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-04 06:22 Message: Logged In: YES user_id=562624 > I'm still considering whether to do this; I measure at > best a 1% speedup for pystone. No surprising considering the fact that pystone is dominated by fastlocals (IIRC it was something like 99.7% according to my instrumented version). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-04 04:29 Message: Logged In: YES user_id=6380 OK. I'm holding my breath! :-) ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-03 05:59 Message: Logged In: YES user_id=562624 I'm working on an improved version. Stay tuned! ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-08-21 05:31 Message: Logged In: YES user_id=6380 Tim explained why the 'globals' case is faster than the 'builtins' case. I used 'x' as the global to look up rather than 'hex', and it so happens that the last three bits of hash('x') and hash('MANY') are the same -- MANY is an identifier I insert in the globals. I'll attach the test suite I used (with 'hex' instead of 'x'). Now I get these times: builtin 1.39 global 1.28 local 1.29 fastlocal 0.91 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 From noreply@sourceforge.net Fri Nov 15 08:40:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 00:40:24 -0800 Subject: [Patches] [ python-Patches-638825 ] Logging 0.4.7 Message-ID: Patches item #638825, was opened at 2002-11-15 08:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Nobody/Anonymous (nobody) Summary: Logging 0.4.7 Initial Comment: Recently released version 0.4.7 of the logging package (core of this recently accepted into CVS for 2.3). Includes unit tests and examples. Main regression test harness is log_test.py. There are some known issues - reported by Neal Norwitz. I will upload a patch to handle these. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 From noreply@sourceforge.net Fri Nov 15 09:10:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 01:10:02 -0800 Subject: [Patches] [ python-Patches-638825 ] Logging 0.4.7 & patches thereto Message-ID: Patches item #638825, was opened at 2002-11-15 08:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) >Assigned to: Guido van Rossum (gvanrossum) >Summary: Logging 0.4.7 & patches thereto Initial Comment: Recently released version 0.4.7 of the logging package (core of this recently accepted into CVS for 2.3). Includes unit tests and examples. Main regression test harness is log_test.py. There are some known issues - reported by Neal Norwitz. I will upload a patch to handle these. ---------------------------------------------------------------------- >Comment By: Vinay Sajip (vsajip) Date: 2002-11-15 09:10 Message: Logged In: YES user_id=308438 I've uploaded patches.zip, containing logging.patch (patches the logging module) and test.patch (patch to log_test17.py) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 From noreply@sourceforge.net Fri Nov 15 10:01:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 02:01:06 -0800 Subject: [Patches] [ python-Patches-635933 ] make some type attrs writable Message-ID: Patches item #635933, was opened at 2002-11-09 14:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Michael Hudson (mwh) Summary: make some type attrs writable Initial Comment: As per discussion on python-dev, this patch makes the following attributes of type objects writable from Python: - __name__ - __bases__ - __mro__ It also relaxes the restriction on not returning __module__ when that's been set to a non-string. This (tiny) part is a 2.2.3 candidate IMHO. It lets the following work: class C(object): pass class D(C): pass class E(object): def meth(self): print 1 d = D() D.__bases__ = (C, E) d.meth() but that's the extent of my testing so far. Needs a test and docs -- if the current behaviour is documented anywhere. Currently, if an assignment to __bases__ would change __base__, it complains (was easiest). Assigned to Guido so he sees it, but anyone else is encouraged to review it! ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-11-15 10:01 Message: Logged In: YES user_id=6656 All the code for this is on my laptop, which is at home, so nothing is getting checked in until Monday at the earliest. > Michael, can you checking some of this in as separate pieces? Rearranging that so it makes sense , yes that's probably a good idea. > The __module__ relaxation should go in first, and marked as > backport candidate. OK. Easy. > The __name__ fix is close, but I think it *should* be > allowed to put dots in the name (this is actually a feature > for old classes); instead of '.' I want a check that there > are no \0 bytes in the string (see set_name() in classobject.c). OK. > I think the restrictions on __bases__ are sufficiently > thought out; Do you mean INsufficiently thought out? If so, I agree. It also occurred to me that there's probably stuff to be done so __subclasses__() continues to work. > There's a bug in set_mro(): it checks PyInstance_Check() > where it clearly means PyClass_Check(). Doh. > Other than that I think it's good to go. > (Though this is the ultimate weird feature! What's the use > case again?) Well, with assignment to __bases__ I don't need it anymore. > Hoping for unit tests, There's NO WAY I'm checking this in without them, don't worry. When do you want this done by? It might take a week or two. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 17:28 Message: Logged In: YES user_id=6380 Michael, can you checking some of this in as separate pieces? The __module__ relaxation should go in first, and marked as backport candidate. The __name__ fix is close, but I think it *should* be allowed to put dots in the name (this is actually a feature for old classes); instead of '.' I want a check that there are no \0 bytes in the string (see set_name() in classobject.c). I think the restrictions on __bases__ are sufficiently thought out; with old-style classes, you can do much more class switching: >>> class C: pass >>> class D: pass >>> D.__bases__ = (C,) >>> I'd like this to work for new-style classes too. It means that __base__ has to change though. There's a bug in set_mro(): it checks PyInstance_Check() where it clearly means PyClass_Check(). Other than that I think it's good to go. (Though this is the ultimate weird feature! What's the use case again?) Hoping for unit tests, ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-09 15:01 Message: Logged In: YES user_id=6656 Hmm, I misunderstood __base__. It's the base class that *leads* to the solid base, not the solid base. So an assignment to __bases__ may justifyable change it. Oops. Will try again later... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 From noreply@sourceforge.net Fri Nov 15 16:50:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 08:50:17 -0800 Subject: [Patches] [ python-Patches-635933 ] make some type attrs writable Message-ID: Patches item #635933, was opened at 2002-11-09 09:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Michael Hudson (mwh) Summary: make some type attrs writable Initial Comment: As per discussion on python-dev, this patch makes the following attributes of type objects writable from Python: - __name__ - __bases__ - __mro__ It also relaxes the restriction on not returning __module__ when that's been set to a non-string. This (tiny) part is a 2.2.3 candidate IMHO. It lets the following work: class C(object): pass class D(C): pass class E(object): def meth(self): print 1 d = D() D.__bases__ = (C, E) d.meth() but that's the extent of my testing so far. Needs a test and docs -- if the current behaviour is documented anywhere. Currently, if an assignment to __bases__ would change __base__, it complains (was easiest). Assigned to Guido so he sees it, but anyone else is encouraged to review it! ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 11:50 Message: Logged In: YES user_id=6380 Monday is fine! Sorry for the typos. If you don't need __mro__ assignment, let's not do that part then. I'd like to see it done ASAP, but at least before we release Python 2.3a1 -- which is due before Xmas. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-15 05:01 Message: Logged In: YES user_id=6656 All the code for this is on my laptop, which is at home, so nothing is getting checked in until Monday at the earliest. > Michael, can you checking some of this in as separate pieces? Rearranging that so it makes sense , yes that's probably a good idea. > The __module__ relaxation should go in first, and marked as > backport candidate. OK. Easy. > The __name__ fix is close, but I think it *should* be > allowed to put dots in the name (this is actually a feature > for old classes); instead of '.' I want a check that there > are no \0 bytes in the string (see set_name() in classobject.c). OK. > I think the restrictions on __bases__ are sufficiently > thought out; Do you mean INsufficiently thought out? If so, I agree. It also occurred to me that there's probably stuff to be done so __subclasses__() continues to work. > There's a bug in set_mro(): it checks PyInstance_Check() > where it clearly means PyClass_Check(). Doh. > Other than that I think it's good to go. > (Though this is the ultimate weird feature! What's the use > case again?) Well, with assignment to __bases__ I don't need it anymore. > Hoping for unit tests, There's NO WAY I'm checking this in without them, don't worry. When do you want this done by? It might take a week or two. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 12:28 Message: Logged In: YES user_id=6380 Michael, can you checking some of this in as separate pieces? The __module__ relaxation should go in first, and marked as backport candidate. The __name__ fix is close, but I think it *should* be allowed to put dots in the name (this is actually a feature for old classes); instead of '.' I want a check that there are no \0 bytes in the string (see set_name() in classobject.c). I think the restrictions on __bases__ are sufficiently thought out; with old-style classes, you can do much more class switching: >>> class C: pass >>> class D: pass >>> D.__bases__ = (C,) >>> I'd like this to work for new-style classes too. It means that __base__ has to change though. There's a bug in set_mro(): it checks PyInstance_Check() where it clearly means PyClass_Check(). Other than that I think it's good to go. (Though this is the ultimate weird feature! What's the use case again?) Hoping for unit tests, ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-11-09 10:01 Message: Logged In: YES user_id=6656 Hmm, I misunderstood __base__. It's the base class that *leads* to the solid base, not the solid base. So an assignment to __bases__ may justifyable change it. Oops. Will try again later... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=635933&group_id=5470 From noreply@sourceforge.net Fri Nov 15 16:51:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 08:51:23 -0800 Subject: [Patches] [ python-Patches-638825 ] Logging 0.4.7 & patches thereto Message-ID: Patches item #638825, was opened at 2002-11-15 03:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Guido van Rossum (gvanrossum) Summary: Logging 0.4.7 & patches thereto Initial Comment: Recently released version 0.4.7 of the logging package (core of this recently accepted into CVS for 2.3). Includes unit tests and examples. Main regression test harness is log_test.py. There are some known issues - reported by Neal Norwitz. I will upload a patch to handle these. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 11:51 Message: Logged In: YES user_id=6380 Vinaj, I'm unclear on what to do with this. Can you clarify? ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2002-11-15 04:10 Message: Logged In: YES user_id=308438 I've uploaded patches.zip, containing logging.patch (patches the logging module) and test.patch (patch to log_test17.py) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 From noreply@sourceforge.net Fri Nov 15 16:54:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 08:54:26 -0800 Subject: [Patches] [ python-Patches-597907 ] Oren Tirosh's fastnames patch Message-ID: Patches item #597907, was opened at 2002-08-20 15:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 1 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Oren Tirosh's fastnames patch Initial Comment: Oren Tirosh had a nice patch to *really* speed up global/builtin name lookup. I'm adding it here because I don't want to lose this idea. His code and some comments are here: http://www.tothink.com/python/fastnames/ I'm uploading a new version of this patch relative to current CVS. I'm still considering whether to do this; I measure at best a 1% speedup for pystone. For a modified version of Oren's benchmark (modified to use a function instead of a class for 'builtin' and 'global', so that these tests use LOAD_GLOBAL rather than LOAD_NAME, I get these test results (best of 3): builtin 1.38 global 1.54 local 1.28 fastlocal 0.90 Python 2.3 without his patch (but with my speedup hacks in LOAD_GLOBAL): builtin 1.80 global 1.52 local 1.77 fastlocal 0.91 Python 2.2 (from the 2.2 branch, which is newer than the 2.2.1 release but doesn't have any speedups) did this: builtin 2.28 global 1.86 local 1.80 fastlocal 1.10 I don't care about the speedup for the 'local' case, since this uses the LOAD_NAME opcode which is only used inside class definitions; the 'builtin' and 'global' cases are interesting. It looks like Oren's patch gives us a nice speedup for looking up a built-in name from a function. I have to think about why looking up a global from a function is slower though... ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 11:54 Message: Logged In: YES user_id=6380 Hm, that patch doesn't have all the trickery here. Maybe Oren can explain what his intentions were? I don't have time to sort through all this -- if someone else wants to, that's fine (I've got a feeling Oren has other priorities these days). ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-11-15 01:01 Message: Logged In: YES user_id=250749 I notice Oren uploaded what appears to be an updated patch (fastnames5.patch) under patch #606098. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-11-15 00:57 Message: Logged In: YES user_id=250749 I notice Oren uploaded what appears to be an updated patch (fastnames5.patch) under patch #606098. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 12:31 Message: Logged In: YES user_id=6380 Lowered priority until Oren uploads his long-awaited new version. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-23 12:35 Message: Logged In: YES user_id=6380 Oren, any chance that you'll submit a new version of this? ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-03 16:22 Message: Logged In: YES user_id=562624 > I'm still considering whether to do this; I measure at > best a 1% speedup for pystone. No surprising considering the fact that pystone is dominated by fastlocals (IIRC it was something like 99.7% according to my instrumented version). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-09-03 14:29 Message: Logged In: YES user_id=6380 OK. I'm holding my breath! :-) ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-09-02 15:59 Message: Logged In: YES user_id=562624 I'm working on an improved version. Stay tuned! ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-08-20 15:31 Message: Logged In: YES user_id=6380 Tim explained why the 'globals' case is faster than the 'builtins' case. I used 'x' as the global to look up rather than 'hex', and it so happens that the last three bits of hash('x') and hash('MANY') are the same -- MANY is an identifier I insert in the globals. I'll attach the test suite I used (with 'hex' instead of 'x'). Now I get these times: builtin 1.39 global 1.28 local 1.29 fastlocal 0.91 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=597907&group_id=5470 From noreply@sourceforge.net Fri Nov 15 16:55:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 08:55:28 -0800 Subject: [Patches] [ python-Patches-606098 ] fast dictionary lookup by name Message-ID: Patches item #606098, was opened at 2002-09-07 14:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=606098&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) >Assigned to: Guido van Rossum (gvanrossum) Summary: fast dictionary lookup by name Initial Comment: This patch speeds up dictionary lookup when the key is an interned string. Test results (Guido's tar from patch #597907) Before: builtin 2.01 global 1.57 local 1.87 fastlocal 1.02 After: builtin 1.78 global 1.63 local 1.51 fastlocal 1.06 Not as impressive as the last patch because this version doesn't use the inline macro yet. A dummy/negative entry is now defined as me_key != NULL and me_value == NULL. A dummy entry also has me_hash set to -1 to shave a few more cycles in the search. Management of negative entries and the interaction with table resizing still needs more work. If there is not enough room for a new negative entry it is simply ignored. The bottlneck appears to be the first negative lookup. It starts with a fast search that fails and then falls back to a slow search and inserts a negative entry. This path is significantly slower than without the patch. Subsequent lookups will be much faster but many objects are created where attributes or methods are looked up just once. The solution I am considering is to change lookdict_string to lookdict_interned. A dictionary in this state has only interned string keys so the fast search is guaranteed to produce the correct result if the lookup key is also an interned string with no fallback required. This should also make it easier to speed up PyDict_SetItem. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 11:55 Message: Logged In: YES user_id=6380 Assigned to me because it may relate to patch 597907. But I have no time to sort through this, and it seems Oren's priorities lie elsewhere. If someone can help out, please do! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=606098&group_id=5470 From noreply@sourceforge.net Fri Nov 15 19:45:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 11:45:21 -0800 Subject: [Patches] [ python-Patches-639063 ] More docs for bdist_wininst Message-ID: Patches item #639063, was opened at 2002-11-15 20:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639063&group_id=5470 Category: None Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: More docs for bdist_wininst Initial Comment: These are updated docs for bdist_wininst, only describing the features which were already there in 2.2.2. Fred, I'm not LaTeX expert as you know, so there is probably something you want to change. It builds, however, when I run 'make pdf' without problems as far as I can see (except for a undefined reference on line 84). Shall I check this in and you make your adjustments afterwards? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639063&group_id=5470 From noreply@sourceforge.net Fri Nov 15 20:01:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 12:01:37 -0800 Subject: [Patches] [ python-Patches-639063 ] More docs for bdist_wininst Message-ID: Patches item #639063, was opened at 2002-11-15 20:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639063&group_id=5470 Category: None Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: More docs for bdist_wininst Initial Comment: These are updated docs for bdist_wininst, only describing the features which were already there in 2.2.2. Fred, I'm not LaTeX expert as you know, so there is probably something you want to change. It builds, however, when I run 'make pdf' without problems as far as I can see (except for a undefined reference on line 84). Shall I check this in and you make your adjustments afterwards? ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2002-11-15 21:01 Message: Logged In: YES user_id=11105 Fred answered 'yes' by private email, so I check it in. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639063&group_id=5470 From noreply@sourceforge.net Fri Nov 15 20:16:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 12:16:40 -0800 Subject: [Patches] [ python-Patches-639063 ] More docs for bdist_wininst Message-ID: Patches item #639063, was opened at 2002-11-15 20:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639063&group_id=5470 >Category: Documentation Group: Python 2.2.x >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Thomas Heller (theller) >Assigned to: Thomas Heller (theller) Summary: More docs for bdist_wininst Initial Comment: These are updated docs for bdist_wininst, only describing the features which were already there in 2.2.2. Fred, I'm not LaTeX expert as you know, so there is probably something you want to change. It builds, however, when I run 'make pdf' without problems as far as I can see (except for a undefined reference on line 84). Shall I check this in and you make your adjustments afterwards? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-11-15 21:01 Message: Logged In: YES user_id=11105 Fred answered 'yes' by private email, so I check it in. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639063&group_id=5470 From noreply@sourceforge.net Fri Nov 15 21:20:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 13:20:57 -0800 Subject: [Patches] [ python-Patches-639112 ] _strptime fixes for None locale and tz Message-ID: Patches item #639112, was opened at 2002-11-15 13:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639112&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: _strptime fixes for None locale and tz Initial Comment: Andrew MacIntyre found two problems with _strptime.py while compiled on FreeBSD 4.4 (email can be found at http://mail.python.org/pipermail/python-dev/2002-November/029873.html). One bug was when the name of the locale was set to None an error was thrown when comparing the language setting when strptime() was passed in a generated re object. The other problem was a failure where the improper timezone value was being set because the locale had the same timezone name for both with and without daylight savings. I fixed that by just checking if this case occured; if it did I left the timezone value as -1 since there is no way to know what the correct value is. Now I don't know if either of these fixes are considered too platform-specific. But I would think that FreeBSD is a big enough of a platform it might be warranted. Plus locale settings are inconsistent enough as it is that dealing with one more funky possibility won't hurt anything... hopefully. If patch #633633 (Cleanup for test_strptime.py) is still open when this patch is closed, just go ahead and close it without applying the patch. The patch here includes everything from that patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639112&group_id=5470 From noreply@sourceforge.net Fri Nov 15 21:23:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 13:23:12 -0800 Subject: [Patches] [ python-Patches-633633 ] Cleanup of test_strptime.py Message-ID: Patches item #633633, was opened at 2002-11-04 20:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Barry A. Warsaw (bwarsaw) Summary: Cleanup of test_strptime.py Initial Comment: I finally got around to cleaning up test_strptime.py . Basically all I did was break all the lines that went over 80 characters (although there a few that go over by a char or two). I also removed the __version__ variable. Who ever applies this patch wishes to you can go ahead and also remove the __version__ variable for _strptime.py ; it's a relic and not needed let alone updated since I never remember to. And yes, the testing suite still runs and passes all the tests. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-11-15 13:23 Message: Logged In: YES user_id=357491 I just started patch #639112 which includes this patch within it since this patch had not been applied yet. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-14 15:27 Message: Logged In: YES user_id=6380 Please go ahead. The style violations in that file have bothered me for a long time. :-( ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-11-12 08:41 Message: Logged In: YES user_id=12800 I'll take this one. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=633633&group_id=5470 From noreply@sourceforge.net Fri Nov 15 22:22:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 14:22:20 -0800 Subject: [Patches] [ python-Patches-639138 ] Ref. calendar module in time docs Message-ID: Patches item #639138, was opened at 2002-11-15 22:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639138&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Ref. calendar module in time docs Initial Comment: Add reference to calendar module in time module docs. calendar.timegm is particularly well-hidden. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639138&group_id=5470 From noreply@sourceforge.net Fri Nov 15 22:25:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 14:25:39 -0800 Subject: [Patches] [ python-Patches-639139 ] Remove type-check from urllib2 Message-ID: Patches item #639139, was opened at 2002-11-15 22:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Remove type-check from urllib2 Initial Comment: Remove undesirable type-checking assertion from urllib2.Request. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 From noreply@sourceforge.net Fri Nov 15 23:01:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 15:01:45 -0800 Subject: [Patches] [ python-Patches-639138 ] Ref. calendar module in time docs Message-ID: Patches item #639138, was opened at 2002-11-15 17:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639138&group_id=5470 Category: Documentation Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: John J Lee (jjlee) >Assigned to: Neal Norwitz (nnorwitz) Summary: Ref. calendar module in time docs Initial Comment: Add reference to calendar module in time module docs. calendar.timegm is particularly well-hidden. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 18:01 Message: Logged In: YES user_id=33168 I modified the wording a bit. Checked in as Doc/lib/libtime.tex 1.52 and 1.48.6.3 Thanks! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639138&group_id=5470 From noreply@sourceforge.net Fri Nov 15 23:14:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 15:14:14 -0800 Subject: [Patches] [ python-Patches-639139 ] Remove type-check from urllib2 Message-ID: Patches item #639139, was opened at 2002-11-15 17:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 >Category: Library (Lib) >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Remove type-check from urllib2 Initial Comment: Remove undesirable type-checking assertion from urllib2.Request. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 18:14 Message: Logged In: YES user_id=33168 John, could you explain why you need it and what is the benefit? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 From noreply@sourceforge.net Fri Nov 15 23:34:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Nov 2002 15:34:39 -0800 Subject: [Patches] [ python-Patches-638825 ] Logging 0.4.7 & patches thereto Message-ID: Patches item #638825, was opened at 2002-11-15 03:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) >Assigned to: Neal Norwitz (nnorwitz) Summary: Logging 0.4.7 & patches thereto Initial Comment: Recently released version 0.4.7 of the logging package (core of this recently accepted into CVS for 2.3). Includes unit tests and examples. Main regression test harness is log_test.py. There are some known issues - reported by Neal Norwitz. I will upload a patch to handle these. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 18:34 Message: Logged In: YES user_id=33168 Vinay, logging.patch is a reverse diff. It would be easier if you used CVS to generate the diff. That would also allow changes to be made in the python version and you could integrate them easily. patches.zip contains 2 files: test.patch & logging.patch. I have applied logging.patch. as logging/__init__.py 1.3 and logging/config.py 1.4. test.patch should be applied to the tests in logging-0.4.7.tar.gz, is that correct? Vinay, were you going to convert the tests in the .tar.gz so they could be integrated into the python test suite? If so, it seems the .tar.gz is not necessary. We could close this patch and when you have finished packaging up your tests, you could submit a new patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 11:51 Message: Logged In: YES user_id=6380 Vinaj, I'm unclear on what to do with this. Can you clarify? ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2002-11-15 04:10 Message: Logged In: YES user_id=308438 I've uploaded patches.zip, containing logging.patch (patches the logging module) and test.patch (patch to log_test17.py) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 From noreply@sourceforge.net Sat Nov 16 12:20:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Nov 2002 04:20:47 -0800 Subject: [Patches] [ python-Patches-639307 ] new string method -- format Message-ID: Patches item #639307, was opened at 2002-11-16 07:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jp Calderone (kuran) Assigned to: Nobody/Anonymous (nobody) Summary: new string method -- format Initial Comment: Attached patch adds a method, 'format', to str and unicode types. The method performs the same operation as the string interpolation operator. The patch also includes modifications to test_format.py as well as libstdtypes.tex (tex code untested - I can't figure out latex; hopefully it is correct though, much is copy/pasted from elsewhere). Aside from having wanted this method forever, one of my use cases is building a list of objects to be displayed in a somewhat generic fashion. Currently an explicit function is required for the simple operation of string interpolation, either by def'ing one or using a lambda, while other, more complex operations. Example attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 From noreply@sourceforge.net Sat Nov 16 12:28:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Nov 2002 04:28:09 -0800 Subject: [Patches] [ python-Patches-639307 ] new string method -- format Message-ID: Patches item #639307, was opened at 2002-11-16 07:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None >Priority: 3 Submitted By: Jp Calderone (kuran) Assigned to: Nobody/Anonymous (nobody) Summary: new string method -- format Initial Comment: Attached patch adds a method, 'format', to str and unicode types. The method performs the same operation as the string interpolation operator. The patch also includes modifications to test_format.py as well as libstdtypes.tex (tex code untested - I can't figure out latex; hopefully it is correct though, much is copy/pasted from elsewhere). Aside from having wanted this method forever, one of my use cases is building a list of objects to be displayed in a somewhat generic fashion. Currently an explicit function is required for the simple operation of string interpolation, either by def'ing one or using a lambda, while other, more complex operations. Example attached. ---------------------------------------------------------------------- >Comment By: Jp Calderone (kuran) Date: 2002-11-16 07:28 Message: Logged In: YES user_id=366566 Promised example. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 From noreply@sourceforge.net Sat Nov 16 16:42:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Nov 2002 08:42:23 -0800 Subject: [Patches] [ python-Patches-639307 ] new string method -- format Message-ID: Patches item #639307, was opened at 2002-11-16 13:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 3 Submitted By: Jp Calderone (kuran) Assigned to: Nobody/Anonymous (nobody) Summary: new string method -- format Initial Comment: Attached patch adds a method, 'format', to str and unicode types. The method performs the same operation as the string interpolation operator. The patch also includes modifications to test_format.py as well as libstdtypes.tex (tex code untested - I can't figure out latex; hopefully it is correct though, much is copy/pasted from elsewhere). Aside from having wanted this method forever, one of my use cases is building a list of objects to be displayed in a somewhat generic fashion. Currently an explicit function is required for the simple operation of string interpolation, either by def'ing one or using a lambda, while other, more complex operations. Example attached. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-16 17:42 Message: Logged In: YES user_id=21627 There should be one-- and preferably only one --obvious way to do it. The advantage of adding this method is not clear; you can easily achieve the same effect with class BoundMod: def __init__(self, obj): self.obj = obj def __call__(self, otherarg): return self.obj % otherarg ... def getDisplayList(self): return [ BoundMod('Link'), self.complexOutput ] Alternatively, lamda x:'Link' % x has the same effect. If you still want that feature, I suggest that you write a PEP. There are a number of alternatives to consider, for example calling the method __mod__. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-11-16 13:28 Message: Logged In: YES user_id=366566 Promised example. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 From noreply@sourceforge.net Sat Nov 16 16:48:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Nov 2002 08:48:27 -0800 Subject: [Patches] [ python-Patches-639371 ] Removal of FreeBSD 5.0 specific test Message-ID: Patches item #639371, was opened at 2002-11-16 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Marc Recht (marc) Assigned to: Nobody/Anonymous (nobody) Summary: Removal of FreeBSD 5.0 specific test Initial Comment: After latest Additions to the FreeBSD 5.0-current headers, the special case isn't needed any longer. The two last problematic functions are ctermid_r and setgroups which aren't defined in the POSIX/XOPEN case. This patch works around the problem by setting CFLAGS with -Wall -Werror for gcc before checking for this two functions. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 From noreply@sourceforge.net Sat Nov 16 23:16:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Nov 2002 15:16:44 -0800 Subject: [Patches] [ python-Patches-634866 ] general corrections to 2.2.2 refman, p.1 Message-ID: Patches item #634866, was opened at 2002-11-07 09:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634866&group_id=5470 Category: Documentation Group: Python 2.2.x >Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: general corrections to 2.2.2 refman, p.1 Initial Comment: as per email exchanges with F. Drake, here's a first part of suggested corrections to the 2.2.2 reference manual, mostly to make it reflect a bit better the way Python currently works. ---------------------------------------------------------------------- >Comment By: Alex Martelli (aleax) Date: 2002-11-17 00:16 Message: Logged In: YES user_id=60314 OK, corrected my mistakes and nuked all \C, e.g. and i.e. from chapters 2, 3, 6 (the chapters affected by the patch). ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-11-13 21:07 Message: Logged In: YES user_id=3066 Alex, a few nits: - In the first chunk, I suspect you meant "triple-quoted string literal", not "raw string literal". - "e.g." and "i.e." should be avoided with great prejudice. I've been removing them from the rest of the documentation as I've had time. - "\C" and "\C{}" should both be replaced with just "C" whenever found. - When you refer to the "C or Java implementation", realize that the Java implementation is more deterministic; Java ints are 32 bits, period, IIRC. - There is not __iterkeys__(), only iterkeys(). I have not tried applying the patch to test formatting. Ok, it sounds like a lot of things, but they're all rather small. Your patch really helps; thanks! If you can make these changes and post an updated patch, it shouldn't take long to get it committed. I've marked the patch "pending" since I'm waiting for changes. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=634866&group_id=5470 From noreply@sourceforge.net Sun Nov 17 12:14:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 04:14:48 -0800 Subject: [Patches] [ python-Patches-639635 ] New icon for .py files Message-ID: Patches item #639635, was opened at 2002-11-17 12:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639635&group_id=5470 Category: Windows Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tim Allen (thristian) Assigned to: Nobody/Anonymous (nobody) Summary: New icon for .py files Initial Comment: With my shiny new Python-on-Win32-related job, I download and install a shiny new copy of Python 2.2.2 on my shiny new copy of Windows XP. Lo and behold, the default icon for .py files is exactly the same file that came with the version of Python that I installed years ago on Windows 3.1 - what's more, it only has a 32x32, 4-bit image, which means that it looks ugly at, say, 16x16 - a very common size in Win95 and above. Attached is my attempt at a new, more Windows-y icon for .py files. It contains images at 16x16, 32x32 and 48x48 (Windows' "Use large icons" option), at 1-bit, 4-bit and 8-bit depths, so it should look reasonable on a decent number of displays. If this icon meets with approval, I could be pursuaded to draw other icons for the core Python executable, and .pyc and .pyo files. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639635&group_id=5470 From noreply@sourceforge.net Sun Nov 17 14:46:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 06:46:10 -0800 Subject: [Patches] [ python-Patches-639139 ] Remove type-check from urllib2 Message-ID: Patches item #639139, was opened at 2002-11-15 22:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Remove type-check from urllib2 Initial Comment: Remove undesirable type-checking assertion from urllib2.Request. ---------------------------------------------------------------------- >Comment By: John J Lee (jjlee) Date: 2002-11-17 14:46 Message: Logged In: YES user_id=261020 It's widely regarded as a bug if Python code checks for type with isinstance (or type(foo) == type(bar)) without some good reason. It's plausible that you may want to make an object that implements the Request interface without deriving from Request (say, I don't know, to implement the frobozz URI scheme, which requires ordered headers, and never has any data associated with it). If so, you don't want to have to follow 'bug fixes' in the Python std. library that may break your code simply because you had to derive from Request to satisfy the assertion. I might have done this when I wrote a couple of modules that build on urllib2, actually. I'm not sure whether that would have been the best way, because I didn't think about it since I didn't have any choice in the matter, thanks to this assertion! OTOH, it's true that removing type-checks can break backwards compatibility. However, this is an assertion, not a real runtime type-check, so it won't break backwards compatibility: if people are relying on catching AssertionError to do type-checking in their own code, that's their problem! The docs say: urlopen(url[, data]) Open the URL url, which can be either a string or a Request object (currently the code checks that it really is a Request instance, or an instance of a subclass of Request). Note the 'currently' (and the source code comment indicating that what we really want to check is the interface), and that fact that the code *doesn't* actually check it, but only asserts. Request interface is already documented, so there's no problem there. John ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 23:14 Message: Logged In: YES user_id=33168 John, could you explain why you need it and what is the benefit? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 From noreply@sourceforge.net Sun Nov 17 17:55:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 09:55:49 -0800 Subject: [Patches] [ python-Patches-639139 ] Remove type-check from urllib2 Message-ID: Patches item #639139, was opened at 2002-11-15 17:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Remove type-check from urllib2 Initial Comment: Remove undesirable type-checking assertion from urllib2.Request. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-17 12:55 Message: Logged In: YES user_id=80475 I see no problem with weakening the assertion, but hasattr should check for a required part of the interface instead of a new, undocumented, dummy attribute. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2002-11-17 09:46 Message: Logged In: YES user_id=261020 It's widely regarded as a bug if Python code checks for type with isinstance (or type(foo) == type(bar)) without some good reason. It's plausible that you may want to make an object that implements the Request interface without deriving from Request (say, I don't know, to implement the frobozz URI scheme, which requires ordered headers, and never has any data associated with it). If so, you don't want to have to follow 'bug fixes' in the Python std. library that may break your code simply because you had to derive from Request to satisfy the assertion. I might have done this when I wrote a couple of modules that build on urllib2, actually. I'm not sure whether that would have been the best way, because I didn't think about it since I didn't have any choice in the matter, thanks to this assertion! OTOH, it's true that removing type-checks can break backwards compatibility. However, this is an assertion, not a real runtime type-check, so it won't break backwards compatibility: if people are relying on catching AssertionError to do type-checking in their own code, that's their problem! The docs say: urlopen(url[, data]) Open the URL url, which can be either a string or a Request object (currently the code checks that it really is a Request instance, or an instance of a subclass of Request). Note the 'currently' (and the source code comment indicating that what we really want to check is the interface), and that fact that the code *doesn't* actually check it, but only asserts. Request interface is already documented, so there's no problem there. John ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 18:14 Message: Logged In: YES user_id=33168 John, could you explain why you need it and what is the benefit? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 From noreply@sourceforge.net Sun Nov 17 17:57:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 09:57:54 -0800 Subject: [Patches] [ python-Patches-639139 ] Remove type-check from urllib2 Message-ID: Patches item #639139, was opened at 2002-11-15 17:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Remove type-check from urllib2 Initial Comment: Remove undesirable type-checking assertion from urllib2.Request. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-17 12:57 Message: Logged In: YES user_id=80475 I see no problem with weakening the assertion, but hasattr should check for a required part of the interface instead of a new, undocumented, dummy attribute. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-17 12:55 Message: Logged In: YES user_id=80475 I see no problem with weakening the assertion, but hasattr should check for a required part of the interface instead of a new, undocumented, dummy attribute. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2002-11-17 09:46 Message: Logged In: YES user_id=261020 It's widely regarded as a bug if Python code checks for type with isinstance (or type(foo) == type(bar)) without some good reason. It's plausible that you may want to make an object that implements the Request interface without deriving from Request (say, I don't know, to implement the frobozz URI scheme, which requires ordered headers, and never has any data associated with it). If so, you don't want to have to follow 'bug fixes' in the Python std. library that may break your code simply because you had to derive from Request to satisfy the assertion. I might have done this when I wrote a couple of modules that build on urllib2, actually. I'm not sure whether that would have been the best way, because I didn't think about it since I didn't have any choice in the matter, thanks to this assertion! OTOH, it's true that removing type-checks can break backwards compatibility. However, this is an assertion, not a real runtime type-check, so it won't break backwards compatibility: if people are relying on catching AssertionError to do type-checking in their own code, that's their problem! The docs say: urlopen(url[, data]) Open the URL url, which can be either a string or a Request object (currently the code checks that it really is a Request instance, or an instance of a subclass of Request). Note the 'currently' (and the source code comment indicating that what we really want to check is the interface), and that fact that the code *doesn't* actually check it, but only asserts. Request interface is already documented, so there's no problem there. John ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 18:14 Message: Logged In: YES user_id=33168 John, could you explain why you need it and what is the benefit? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 From noreply@sourceforge.net Sun Nov 17 18:09:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 10:09:46 -0800 Subject: [Patches] [ python-Patches-639307 ] new string method -- format Message-ID: Patches item #639307, was opened at 2002-11-16 07:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 3 Submitted By: Jp Calderone (kuran) Assigned to: Nobody/Anonymous (nobody) Summary: new string method -- format Initial Comment: Attached patch adds a method, 'format', to str and unicode types. The method performs the same operation as the string interpolation operator. The patch also includes modifications to test_format.py as well as libstdtypes.tex (tex code untested - I can't figure out latex; hopefully it is correct though, much is copy/pasted from elsewhere). Aside from having wanted this method forever, one of my use cases is building a list of objects to be displayed in a somewhat generic fashion. Currently an explicit function is required for the simple operation of string interpolation, either by def'ing one or using a lambda, while other, more complex operations. Example attached. ---------------------------------------------------------------------- Comment By: Terry J. Reedy (tjreedy) Date: 2002-11-17 13:09 Message: Logged In: YES user_id=593130 >From reading the one sentence description, it was not clear to me whether .format() was to be method of a format string or a string to be formatted. Adding formstring.format(*args) as a synonym for formstring % args doesn't make much obvious sense. Besides the redundancy, I dislike this because this special-purpose method would only be valid for the small subclass of strings that are validly used as format strings. On reading the patch, the above seems to be the proposal. From the example, the point seems to be to make it easier to curry the % operator (ie, bind it to its first argument). Better to me is to leave the language alone (there are hundreds of such shortcuts that *could* be added) and either write a factory function returning bound function (via a nested definition) or a string subclass. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-16 11:42 Message: Logged In: YES user_id=21627 There should be one-- and preferably only one --obvious way to do it. The advantage of adding this method is not clear; you can easily achieve the same effect with class BoundMod: def __init__(self, obj): self.obj = obj def __call__(self, otherarg): return self.obj % otherarg ... def getDisplayList(self): return [ BoundMod('Link'), self.complexOutput ] Alternatively, lamda x:'Link' % x has the same effect. If you still want that feature, I suggest that you write a PEP. There are a number of alternatives to consider, for example calling the method __mod__. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-11-16 07:28 Message: Logged In: YES user_id=366566 Promised example. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 From noreply@sourceforge.net Sun Nov 17 20:37:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 12:37:27 -0800 Subject: [Patches] [ python-Patches-639307 ] new string method -- format Message-ID: Patches item #639307, was opened at 2002-11-16 13:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 3 Submitted By: Jp Calderone (kuran) Assigned to: Nobody/Anonymous (nobody) Summary: new string method -- format Initial Comment: Attached patch adds a method, 'format', to str and unicode types. The method performs the same operation as the string interpolation operator. The patch also includes modifications to test_format.py as well as libstdtypes.tex (tex code untested - I can't figure out latex; hopefully it is correct though, much is copy/pasted from elsewhere). Aside from having wanted this method forever, one of my use cases is building a list of objects to be displayed in a somewhat generic fashion. Currently an explicit function is required for the simple operation of string interpolation, either by def'ing one or using a lambda, while other, more complex operations. Example attached. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-17 21:37 Message: Logged In: YES user_id=21627 Terry, thanks for your comments. It becomes clear that such a change would meet considerable resistance in the community. So I reject this patch now; kuran, if you want to see that feature in Python, you will have to write a PEP. ---------------------------------------------------------------------- Comment By: Terry J. Reedy (tjreedy) Date: 2002-11-17 19:09 Message: Logged In: YES user_id=593130 >From reading the one sentence description, it was not clear to me whether .format() was to be method of a format string or a string to be formatted. Adding formstring.format(*args) as a synonym for formstring % args doesn't make much obvious sense. Besides the redundancy, I dislike this because this special-purpose method would only be valid for the small subclass of strings that are validly used as format strings. On reading the patch, the above seems to be the proposal. From the example, the point seems to be to make it easier to curry the % operator (ie, bind it to its first argument). Better to me is to leave the language alone (there are hundreds of such shortcuts that *could* be added) and either write a factory function returning bound function (via a nested definition) or a string subclass. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-16 17:42 Message: Logged In: YES user_id=21627 There should be one-- and preferably only one --obvious way to do it. The advantage of adding this method is not clear; you can easily achieve the same effect with class BoundMod: def __init__(self, obj): self.obj = obj def __call__(self, otherarg): return self.obj % otherarg ... def getDisplayList(self): return [ BoundMod('Link'), self.complexOutput ] Alternatively, lamda x:'Link' % x has the same effect. If you still want that feature, I suggest that you write a PEP. There are a number of alternatives to consider, for example calling the method __mod__. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-11-16 13:28 Message: Logged In: YES user_id=366566 Promised example. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639307&group_id=5470 From noreply@sourceforge.net Sun Nov 17 20:49:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 12:49:29 -0800 Subject: [Patches] [ python-Patches-639371 ] Removal of FreeBSD 5.0 specific test Message-ID: Patches item #639371, was opened at 2002-11-16 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Marc Recht (marc) Assigned to: Nobody/Anonymous (nobody) Summary: Removal of FreeBSD 5.0 specific test Initial Comment: After latest Additions to the FreeBSD 5.0-current headers, the special case isn't needed any longer. The two last problematic functions are ctermid_r and setgroups which aren't defined in the POSIX/XOPEN case. This patch works around the problem by setting CFLAGS with -Wall -Werror for gcc before checking for this two functions. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-11-17 21:49 Message: Logged In: YES user_id=21627 Can you please explain the purpose of setting CFLAGS first to MY_CPPFLAGS etc? Also, why are you seeting OLDFLAGS between the if and the then? Also, setting -Werror might have unintended site effects, IMO, testing for a declaration is better done by checking whether the address of a function can be taken. Apart from that, the patch looks good. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 From noreply@sourceforge.net Sun Nov 17 21:21:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 13:21:10 -0800 Subject: [Patches] [ python-Patches-639139 ] Remove type-check from urllib2 Message-ID: Patches item #639139, was opened at 2002-11-15 22:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Remove type-check from urllib2 Initial Comment: Remove undesirable type-checking assertion from urllib2.Request. ---------------------------------------------------------------------- >Comment By: John J Lee (jjlee) Date: 2002-11-17 21:21 Message: Logged In: YES user_id=261020 Why not a new attribute? What would it break? Checking for the interface by checking all the methods (there are maybe ten of them) is not really practical, and really it's the intent that's the important bit. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-17 17:57 Message: Logged In: YES user_id=80475 I see no problem with weakening the assertion, but hasattr should check for a required part of the interface instead of a new, undocumented, dummy attribute. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-11-17 17:55 Message: Logged In: YES user_id=80475 I see no problem with weakening the assertion, but hasattr should check for a required part of the interface instead of a new, undocumented, dummy attribute. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2002-11-17 14:46 Message: Logged In: YES user_id=261020 It's widely regarded as a bug if Python code checks for type with isinstance (or type(foo) == type(bar)) without some good reason. It's plausible that you may want to make an object that implements the Request interface without deriving from Request (say, I don't know, to implement the frobozz URI scheme, which requires ordered headers, and never has any data associated with it). If so, you don't want to have to follow 'bug fixes' in the Python std. library that may break your code simply because you had to derive from Request to satisfy the assertion. I might have done this when I wrote a couple of modules that build on urllib2, actually. I'm not sure whether that would have been the best way, because I didn't think about it since I didn't have any choice in the matter, thanks to this assertion! OTOH, it's true that removing type-checks can break backwards compatibility. However, this is an assertion, not a real runtime type-check, so it won't break backwards compatibility: if people are relying on catching AssertionError to do type-checking in their own code, that's their problem! The docs say: urlopen(url[, data]) Open the URL url, which can be either a string or a Request object (currently the code checks that it really is a Request instance, or an instance of a subclass of Request). Note the 'currently' (and the source code comment indicating that what we really want to check is the interface), and that fact that the code *doesn't* actually check it, but only asserts. Request interface is already documented, so there's no problem there. John ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 23:14 Message: Logged In: YES user_id=33168 John, could you explain why you need it and what is the benefit? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639139&group_id=5470 From noreply@sourceforge.net Sun Nov 17 23:19:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 15:19:00 -0800 Subject: [Patches] [ python-Patches-639371 ] Removal of FreeBSD 5.0 specific test Message-ID: Patches item #639371, was opened at 2002-11-16 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Marc Recht (marc) Assigned to: Nobody/Anonymous (nobody) Summary: Removal of FreeBSD 5.0 specific test Initial Comment: After latest Additions to the FreeBSD 5.0-current headers, the special case isn't needed any longer. The two last problematic functions are ctermid_r and setgroups which aren't defined in the POSIX/XOPEN case. This patch works around the problem by setting CFLAGS with -Wall -Werror for gcc before checking for this two functions. ---------------------------------------------------------------------- >Comment By: Marc Recht (marc) Date: 2002-11-18 00:19 Message: Logged In: YES user_id=205 The setting of CFLAGS is for the test below. For FreeBSD XOPEN* and POSIX* would be sufficient, but IMO it's better to set all (till this point) found CFLAGS. The setting of OLDCFLAGS should have been one line above the if. It's used to safe the CFLAGS before the tests and restore it later. That allows to set the (problematic?) -Werror for the two checks. But, silly me., you're right.. By checking the address of the function it works without -Wall -Werror. Though the CFLAGS are still needed. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-17 21:49 Message: Logged In: YES user_id=21627 Can you please explain the purpose of setting CFLAGS first to MY_CPPFLAGS etc? Also, why are you seeting OLDFLAGS between the if and the then? Also, setting -Werror might have unintended site effects, IMO, testing for a declaration is better done by checking whether the address of a function can be taken. Apart from that, the patch looks good. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 From noreply@sourceforge.net Mon Nov 18 05:02:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Nov 2002 21:02:15 -0800 Subject: [Patches] [ python-Patches-637906 ] Allow any file-like object on dis module Message-ID: Patches item #637906, was opened at 2002-11-14 03:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Allow any file-like object on dis module Initial Comment: This was useful for me to make a restricted environment by disallowing specific opcodes. ---------------------------------------------------------------------- >Comment By: Hye-Shik Chang (perky) Date: 2002-11-18 14:02 Message: Logged In: YES user_id=55188 Ah. Thank you for your comments. It looks much better than parsing dis's output. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-15 05:17 Message: Logged In: YES user_id=21627 So you want to find out whether BINARY_POWER is in the byte code? If so, I suggest that usage of dis.disassemble is inadequate. Instead, you should just copy the essential part of the disassemble loop, and look for an opcode for which dis.opname[opcode] is 'BINARY_POWER'. This will be much faster, and thread-safe. ---------------------------------------------------------------------- Comment By: Hye-Shik Chang (perky) Date: 2002-11-15 03:07 Message: Logged In: YES user_id=55188 Yes. To disallow BINARY_POWER and INPLACE_POWER, I'm hooking sys.stdout now. But, because it isn't thread-safe way, I needed to lock threads whenever I print something to stdout. To make a cheaper solution, I like this patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-11-15 01:56 Message: Logged In: YES user_id=31435 I'm also missing the connection to restricted environments. If you want to capture dis output (or output from anything else that uses print), the usual way to do it is to assign a StringIO instance (or other file-like object) to sys.stdout before invoking dis. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-14 08:46 Message: Logged In: YES user_id=21627 Can you please elaborate? In what way does is that useful for a restricted environment? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=637906&group_id=5470 From noreply@sourceforge.net Mon Nov 18 15:42:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Nov 2002 07:42:22 -0800 Subject: [Patches] [ python-Patches-639371 ] Removal of FreeBSD 5.0 specific test Message-ID: Patches item #639371, was opened at 2002-11-16 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Marc Recht (marc) Assigned to: Nobody/Anonymous (nobody) Summary: Removal of FreeBSD 5.0 specific test Initial Comment: After latest Additions to the FreeBSD 5.0-current headers, the special case isn't needed any longer. The two last problematic functions are ctermid_r and setgroups which aren't defined in the POSIX/XOPEN case. This patch works around the problem by setting CFLAGS with -Wall -Werror for gcc before checking for this two functions. ---------------------------------------------------------------------- >Comment By: Marc Recht (marc) Date: 2002-11-18 16:42 Message: Logged In: YES user_id=205 Sometime I should just think twice.. So, this version of the patch is cleaner. The checks if the address of the function in question could be taken and instead of setting (gcc specific) compiler flags confdefs.h is included.. So, it should work if all c compilers.. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-18 00:19 Message: Logged In: YES user_id=205 The setting of CFLAGS is for the test below. For FreeBSD XOPEN* and POSIX* would be sufficient, but IMO it's better to set all (till this point) found CFLAGS. The setting of OLDCFLAGS should have been one line above the if. It's used to safe the CFLAGS before the tests and restore it later. That allows to set the (problematic?) -Werror for the two checks. But, silly me., you're right.. By checking the address of the function it works without -Wall -Werror. Though the CFLAGS are still needed. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-17 21:49 Message: Logged In: YES user_id=21627 Can you please explain the purpose of setting CFLAGS first to MY_CPPFLAGS etc? Also, why are you seeting OLDFLAGS between the if and the then? Also, setting -Werror might have unintended site effects, IMO, testing for a declaration is better done by checking whether the address of a function can be taken. Apart from that, the patch looks good. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 From noreply@sourceforge.net Mon Nov 18 18:04:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Nov 2002 10:04:53 -0800 Subject: [Patches] [ python-Patches-639635 ] New icon for .py files Message-ID: Patches item #639635, was opened at 2002-11-17 07:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639635&group_id=5470 Category: Windows Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Tim Allen (thristian) >Assigned to: Tim Peters (tim_one) Summary: New icon for .py files Initial Comment: With my shiny new Python-on-Win32-related job, I download and install a shiny new copy of Python 2.2.2 on my shiny new copy of Windows XP. Lo and behold, the default icon for .py files is exactly the same file that came with the version of Python that I installed years ago on Windows 3.1 - what's more, it only has a 32x32, 4-bit image, which means that it looks ugly at, say, 16x16 - a very common size in Win95 and above. Attached is my attempt at a new, more Windows-y icon for .py files. It contains images at 16x16, 32x32 and 48x48 (Windows' "Use large icons" option), at 1-bit, 4-bit and 8-bit depths, so it should look reasonable on a decent number of displays. If this icon meets with approval, I could be pursuaded to draw other icons for the core Python executable, and .pyc and .pyo files. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-11-18 13:04 Message: Logged In: YES user_id=31435 Hmm. On my Win98SE box, under Explorer this looks like a sheet of paper with what's maybe a gold key resting on it. I thought it was some kind of error icon indicating that the file was corrupt. Switching to "Large Icons" revealed your intent, but in the "Small Icons", "List", and "Detail" (my default) viewing modes, the tail and eyes of the snake just aren't there, leaving the "gold key" impression. In contrast, the current icons look fine in all viewing modes. I don't think it's a good idea to change icons anyway -- I *like* the smiling green snake, and would miss it. So, sorry, but I'm rejecting this particular icon and don't encourage more. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639635&group_id=5470 From noreply@sourceforge.net Mon Nov 18 23:14:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Nov 2002 15:14:51 -0800 Subject: [Patches] [ python-Patches-639371 ] Removal of FreeBSD 5.0 specific test Message-ID: Patches item #639371, was opened at 2002-11-16 17:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Marc Recht (marc) >Assigned to: Martin v. Löwis (loewis) Summary: Removal of FreeBSD 5.0 specific test Initial Comment: After latest Additions to the FreeBSD 5.0-current headers, the special case isn't needed any longer. The two last problematic functions are ctermid_r and setgroups which aren't defined in the POSIX/XOPEN case. This patch works around the problem by setting CFLAGS with -Wall -Werror for gcc before checking for this two functions. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-18 16:42 Message: Logged In: YES user_id=205 Sometime I should just think twice.. So, this version of the patch is cleaner. The checks if the address of the function in question could be taken and instead of setting (gcc specific) compiler flags confdefs.h is included.. So, it should work if all c compilers.. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-11-18 00:19 Message: Logged In: YES user_id=205 The setting of CFLAGS is for the test below. For FreeBSD XOPEN* and POSIX* would be sufficient, but IMO it's better to set all (till this point) found CFLAGS. The setting of OLDCFLAGS should have been one line above the if. It's used to safe the CFLAGS before the tests and restore it later. That allows to set the (problematic?) -Werror for the two checks. But, silly me., you're right.. By checking the address of the function it works without -Wall -Werror. Though the CFLAGS are still needed. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-17 21:49 Message: Logged In: YES user_id=21627 Can you please explain the purpose of setting CFLAGS first to MY_CPPFLAGS etc? Also, why are you seeting OLDFLAGS between the if and the then? Also, setting -Werror might have unintended site effects, IMO, testing for a declaration is better done by checking whether the address of a function can be taken. Apart from that, the patch looks good. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=639371&group_id=5470 From noreply@sourceforge.net Tue Nov 19 13:26:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Nov 2002 05:26:01 -0800 Subject: [Patches] [ python-Patches-638825 ] Logging 0.4.7 & patches thereto Message-ID: Patches item #638825, was opened at 2002-11-15 03:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Neal Norwitz (nnorwitz) Summary: Logging 0.4.7 & patches thereto Initial Comment: Recently released version 0.4.7 of the logging package (core of this recently accepted into CVS for 2.3). Includes unit tests and examples. Main regression test harness is log_test.py. There are some known issues - reported by Neal Norwitz. I will upload a patch to handle these. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-19 08:26 Message: Logged In: YES user_id=33168 As per email, closing this patch. A new patch will contain the tests. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-11-15 18:34 Message: Logged In: YES user_id=33168 Vinay, logging.patch is a reverse diff. It would be easier if you used CVS to generate the diff. That would also allow changes to be made in the python version and you could integrate them easily. patches.zip contains 2 files: test.patch & logging.patch. I have applied logging.patch. as logging/__init__.py 1.3 and logging/config.py 1.4. test.patch should be applied to the tests in logging-0.4.7.tar.gz, is that correct? Vinay, were you going to convert the tests in the .tar.gz so they could be integrated into the python test suite? If so, it seems the .tar.gz is not necessary. We could close this patch and when you have finished packaging up your tests, you could submit a new patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-11-15 11:51 Message: Logged In: YES user_id=6380 Vinaj, I'm unclear on what to do with this. Can you clarify? ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2002-11-15 04:10 Message: Logged In: YES user_id=308438 I've uploaded patches.zip, containing logging.patch (patches the logging module) and test.patch (patch to log_test17.py) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=638825&group_id=5470 From noreply@sourceforge.net Tue Nov 19 16:02:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Nov 2002 08:02:55 -0800 Subject: [Patches] [ python-Patches-545480 ] Examples for urllib2 Message-ID: Patches item #545480, was opened at 2002-04-18 04:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=545480&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sean Reifschneider (jafo) Assigned to: Jeremy Hylton (jhylton) Summary: Examples for urllib2 Initial Comment: An associate who's learning Python recently complained about a lack of examples for urllib2. As a starting point, I'd like to submit the following: This example gets the python.org main page and displays the first 100 bytes of it: >>> import urllib2 >>> url = urllib2.urlopen('http://www.python.org/') >>> print url.read()[:100]