From noreply@sourceforge.net Sat Mar 1 01:30:10 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Feb 2003 17:30:10 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Guido van Rossum (gvanrossum) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-28 20:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Sat Mar 1 02:00:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Feb 2003 18:00:34 -0800 Subject: [Patches] [ python-Patches-693195 ] Add sys.exc_clear() to clear current exception Message-ID: Patches item #693195, was opened at 2003-02-25 16:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693195&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Kevin Jacobs (jacobs99) >Assigned to: Guido van Rossum (gvanrossum) Summary: Add sys.exc_clear() to clear current exception Initial Comment: There is no way to clear the "current" exception, which is available via the sys.exc_info() function. There are a few (obscure) easons why one would want to be able to do so, and mainly due to the implementation details of how exception information is stored. Specifically, sys.exc_info() will return information on the last exception even outside of an 'except:' block that caught the exception. So an exception and all of the frame objects on the stack, and all local variables stored in those frames are kept alive in the last exception traceback until either 1) another exception is thrown, or 2) the stack returns to a frame that is handling another exception (thrown before the "current" exception). Thus, it is sometimes useful to be able to clear the "current" exception. e.g.: 1) Some error handling and logging handlers will report on the current or last exception (as a hint about what may have gone wrong). Once that information is handled, additional error handling or logging calls should not report it again. 2) Sometimes resources are not released when an exception is raised until the next exception is raised. This causes problems for programs that rely on object finalization to release resources (like memory, locks, file descriptors, etc.). Such code is suboptimal, but it exists and there are few easy alternatives other than creating many 'finally:' clauses (which can violate encapsulation and abstraction layer boundries and is syntactically hairy at times). Anyhow, such programs may want to clear the current exception and trigger garbage collection at certain synchronization points, in order to flush pending object finalization. Clearly, this is a somewhat hit-or-miss strategy, though it works fairly well on practice, though no sane developer should ever rely on it. Anyhow, I've implemented a trivial patch to sysmodule.c to add a 'exc_clear()' function that clears the current or last exception. I've also added a test case and updated the documentation. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:00 Message: Logged In: YES user_id=6380 Grabbing this for review. ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2003-02-26 07:39 Message: Logged In: YES user_id=459565 I've updated my patchj based on Neil's feedback: 1) sys_exc_clear and sys_exc_info now use the recommended prototype and the cast to PyCFunction was removed. 2) \versionadded was added to the exc_clear docs. 3) The exc_info docs were slightly modified to better match the updated doc string. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-25 22:32 Message: Logged In: YES user_id=33168 I have a couple of minor things. The prototype should be: sys_exc_clear(PyObject *self, PyObject *noargs). This will remove the (PyCFunction) cast. (I realize there are other places in the file you copied, but they are wrong too. :-) The doc for exc_clear should have a \versionadded{2.3} before the \end. Should the doc for exc_info() also be updated, since the docstring was updated? ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2003-02-25 16:39 Message: Logged In: YES user_id=459565 Before someone else says it -- yes, technically there is a way to "clear" the current exception -- by raising another exception. However that leaves a bogus excepton in the thread state, which still stores at least one Python stack frame. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693195&group_id=5470 From noreply@sourceforge.net Sat Mar 1 02:21:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Feb 2003 18:21:18 -0800 Subject: [Patches] [ python-Patches-695090 ] Make build_py allow modules and packages at the same time Message-ID: Patches item #695090, was opened at 2003-02-28 09:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695090&group_id=5470 Category: Distutils and setup.py Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Bernhard Herzog (bernhard) Assigned to: A.M. Kuchling (akuchling) Summary: Make build_py allow modules and packages at the same time Initial Comment: The build command of the distutils currently doesn't support both python modules and python packages in the same setup.py call. See the distutils-sig for a discussion: http://mail.python.org/pipermail/distutils-sig/2003-February/003192.html This patch modifies the build_py command to allow this. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2003-02-28 21:21 Message: Logged In: YES user_id=11375 Checked in; thanks! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695090&group_id=5470 From noreply@sourceforge.net Sat Mar 1 02:59:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Feb 2003 18:59:37 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-28 20:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Sat Mar 1 03:31:35 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Feb 2003 19:31:35 -0800 Subject: [Patches] [ python-Patches-693195 ] Add sys.exc_clear() to clear current exception Message-ID: Patches item #693195, was opened at 2003-02-25 16:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693195&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Kevin Jacobs (jacobs99) Assigned to: Guido van Rossum (gvanrossum) Summary: Add sys.exc_clear() to clear current exception Initial Comment: There is no way to clear the "current" exception, which is available via the sys.exc_info() function. There are a few (obscure) easons why one would want to be able to do so, and mainly due to the implementation details of how exception information is stored. Specifically, sys.exc_info() will return information on the last exception even outside of an 'except:' block that caught the exception. So an exception and all of the frame objects on the stack, and all local variables stored in those frames are kept alive in the last exception traceback until either 1) another exception is thrown, or 2) the stack returns to a frame that is handling another exception (thrown before the "current" exception). Thus, it is sometimes useful to be able to clear the "current" exception. e.g.: 1) Some error handling and logging handlers will report on the current or last exception (as a hint about what may have gone wrong). Once that information is handled, additional error handling or logging calls should not report it again. 2) Sometimes resources are not released when an exception is raised until the next exception is raised. This causes problems for programs that rely on object finalization to release resources (like memory, locks, file descriptors, etc.). Such code is suboptimal, but it exists and there are few easy alternatives other than creating many 'finally:' clauses (which can violate encapsulation and abstraction layer boundries and is syntactically hairy at times). Anyhow, such programs may want to clear the current exception and trigger garbage collection at certain synchronization points, in order to flush pending object finalization. Clearly, this is a somewhat hit-or-miss strategy, though it works fairly well on practice, though no sane developer should ever rely on it. Anyhow, I've implemented a trivial patch to sysmodule.c to add a 'exc_clear()' function that clears the current or last exception. I've also added a test case and updated the documentation. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 22:31 Message: Logged In: YES user_id=6380 All checked in, thanks. I changed the docstring for sys.exc_info() again, to: "Return information about the most recent exception caught by an except clause in the current stack frame or in an older stack frame." I think this is accurate and concise. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:00 Message: Logged In: YES user_id=6380 Grabbing this for review. ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2003-02-26 07:39 Message: Logged In: YES user_id=459565 I've updated my patchj based on Neil's feedback: 1) sys_exc_clear and sys_exc_info now use the recommended prototype and the cast to PyCFunction was removed. 2) \versionadded was added to the exc_clear docs. 3) The exc_info docs were slightly modified to better match the updated doc string. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-25 22:32 Message: Logged In: YES user_id=33168 I have a couple of minor things. The prototype should be: sys_exc_clear(PyObject *self, PyObject *noargs). This will remove the (PyCFunction) cast. (I realize there are other places in the file you copied, but they are wrong too. :-) The doc for exc_clear should have a \versionadded{2.3} before the \end. Should the doc for exc_info() also be updated, since the docstring was updated? ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2003-02-25 16:39 Message: Logged In: YES user_id=459565 Before someone else says it -- yes, technically there is a way to "clear" the current exception -- by raising another exception. However that leaves a bogus excepton in the thread state, which still stores at least one Python stack frame. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693195&group_id=5470 From noreply@sourceforge.net Sat Mar 1 14:04:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 01 Mar 2003 06:04:49 -0800 Subject: [Patches] [ python-Patches-695581 ] "returnself" -> "return self" in pydoc.py Message-ID: Patches item #695581, was opened at 2003-03-01 14:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695581&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: "returnself" -> "return self" in pydoc.py Initial Comment: The error has probably been introduced in the process of converting the code from using "apply" to "*args". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695581&group_id=5470 From noreply@sourceforge.net Sat Mar 1 14:06:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 01 Mar 2003 06:06:55 -0800 Subject: [Patches] [ python-Patches-695581 ] "returnself" -> "return self" in pydoc.py Message-ID: Patches item #695581, was opened at 2003-03-01 14:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695581&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) >Summary: "returnself" -> "return self" in pydoc.py Initial Comment: The error has probably been introduced in the process of converting the code from using "apply" to "*args". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695581&group_id=5470 From noreply@sourceforge.net Sat Mar 1 15:32:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 01 Mar 2003 07:32:55 -0800 Subject: [Patches] [ python-Patches-695581 ] "returnself" -> "return self" in pydoc.py Message-ID: Patches item #695581, was opened at 2003-03-01 09:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695581&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Oren Tirosh (orenti) >Assigned to: Neal Norwitz (nnorwitz) >Summary: "returnself" -> "return self" in pydoc.py Initial Comment: The error has probably been introduced in the process of converting the code from using "apply" to "*args". ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-01 10:32 Message: Logged In: YES user_id=33168 Thanks! Checked in as: Lib/pydoc.py 1.79 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695581&group_id=5470 From noreply@sourceforge.net Sat Mar 1 19:49:27 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 01 Mar 2003 11:49:27 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 19:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Sun Mar 2 02:40:46 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 01 Mar 2003 18:40:46 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-01 21:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-28 20:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Sun Mar 2 20:52:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 02 Mar 2003 12:52:31 -0800 Subject: [Patches] [ python-Patches-696184 ] Enable __slots__ for meta-types Message-ID: Patches item #696184, was opened at 2003-03-02 21:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696184&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Nobody/Anonymous (nobody) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or complexity. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696184&group_id=5470 From noreply@sourceforge.net Sun Mar 2 21:02:44 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 02 Mar 2003 13:02:44 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 22:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Nobody/Anonymous (nobody) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Mon Mar 3 07:38:30 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 02 Mar 2003 23:38:30 -0800 Subject: [Patches] [ python-Patches-696392 ] allow proxy server authentication with pimp Message-ID: Patches item #696392, was opened at 2003-03-03 07:38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696392&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew Straw (astraw) Assigned to: Jack Jansen (jackjansen) Summary: allow proxy server authentication with pimp Initial Comment: The urllib module does not support http proxy authentication with passwords. The urllib2 module does, so I changed pimp.py to use urllib2. I have tested the patch below after setting my http_proxy environment variable to the form "http://user:pass@proxy.com:1234". It may be possible to remove the dependency on urllib entirely by sustituting a urllib2 work-alike for a call to urllib.url2pathname(). This may affect the exception(s) raised when unable to connect. For example, PackageManager.py catches an IOError, but I believe urllib2 raises a socket.gaierror when unable to resolve the name of the URL. I have not resolved this issue. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696392&group_id=5470 From noreply@sourceforge.net Mon Mar 3 09:45:24 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 01:45:24 -0800 Subject: [Patches] [ python-Patches-671666 ] Make the default encoding provided on Windows Message-ID: Patches item #671666, was opened at 2003-01-21 09:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671666&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: SUZUKI Hisao (suzuki_hisao) Assigned to: Martin v. Löwis (loewis) Summary: Make the default encoding provided on Windows Initial Comment: On Windows, some default encodings are not provided by Python (e.g. "cp932" in Japanese locale), while they are always available as "mbcs" in each locale. This patch ensures them usable in a very efficient way by aliasing them to "mbcs" in such a case. Note that IDLE does not start up on Windows unless the default encoding is provided. The patch makes IDLE operable all over the (Windows) world ;-). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 10:45 Message: Logged In: YES user_id=21627 I missed the point of this patch, indeed. Applied as site.py 1.48. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-02-21 03:10 Message: Logged In: YES user_id=31435 Assigning to Martin, in the hopes they can work out their differences. ---------------------------------------------------------------------- Comment By: SUZUKI Hisao (suzuki_hisao) Date: 2003-01-28 06:49 Message: Logged In: YES user_id=495142 I can reproduce the IDLE problem on my Windows 2000 in Japanese locale. I hope you will confirm it by asking your friends in Japan or other countries. I am afraid you missed the point. The patch does NOT change the default encoding of Python itself. It is ASCII still. It only makes the encoding of locale.getdefaultlocale()[1] be PROVIDED. Please read that short patch. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-21 23:04 Message: Logged In: YES user_id=21627 I'm rejecting this patch. The factory system default encoding of Python is ASCII, on all platforms (atleast, it should be this way; MacOS currently deviates). I cannot reproduce the IDLE problem; IDLE starts without that patch just fine. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671666&group_id=5470 From noreply@sourceforge.net Mon Mar 3 10:15:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 02:15:37 -0800 Subject: [Patches] [ python-Patches-658327 ] Add inet_pton and inet_ntop to socket Message-ID: Patches item #658327, was opened at 2002-12-24 22:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jp Calderone (kuran) Assigned to: Martin v. Löwis (loewis) Summary: Add inet_pton and inet_ntop to socket Initial Comment: Patch is against current CVS and adds two socket module functions, inet_pton and inet_ntop. Both of these should be available on all platforms (because of other dependancies in the code) so I don't think portability is a problem. inet_ntop converts a packed IP address to a human-readable '.' or ':' separated string representation of the IP. inet_pton performs the reverse operation. (Potential) problems: inet_pton sets errno to ENOSPC, which may lead to a confusing error message. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:15 Message: Logged In: YES user_id=21627 The has_ipv6 test is only there for the tests? In that case, drop it, and just perform AF_INET6 conversions unconditionally. OTOH, I think we should not expose the emulated inet_pton: it doesn't set errno correctly, and offers no advantage over inet_addr. So wrap the entire code with HAVE_INET_PTON, and only perform the tests if the function is supported. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-05 03:40 Message: Logged In: YES user_id=33168 I was just about to check this in, but then I ran into a problem. IPv6 may not be enabled, even if the constant AF_INET6 exists. The cleanest way I saw to address this in the test was to add a has_ipv6 boolean constant to the socket module. Martin, do you think this is acceptable? Attached is a complete patch which should be safe (based on the discussion below), includes tests and doc changes. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2003-01-11 18:04 Message: Logged In: YES user_id=366566 Yea, testing for the proper input length is definitely something that should be done. The patch looks good, but for one thing. If the specified address family is neither AF_INET nor AF_INET6, the length won't be tested and the underlying inet_ntop will be called. This isn't a problem now (afaik) because only those two address families are support, but in a future libc version with more supported address families, it might open a similar hole to the one you've fixed. Perhaps the + } else { + PyErr_SetString(socket_error, "unknown address family"); + return NULL; + } should be moved up from the second if-grouping to follow the first if-grouping. Everything else looks good to me. Thanks for taking the time to look at this :) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-11 04:49 Message: Logged In: YES user_id=33168 JP, do you agree with my comment on 2002-12-30 about the checks? I have attached an updated patch. Please review and verify this is correct. Thank you for the additional tests. Feel free to submit patches with additional tests for any and all modules! ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-31 17:52 Message: Logged In: YES user_id=366566 Doc, NEWS, and test_socket patch attached. I didn't notice any inet_aton/inet_ntoa tests in the module so I added a couple for those as well (I excluded a test for inet_ntoa('255.255.255.255') ;) Also included are a couple IPv6 tests. I'm not sure if these are appropriate, since many systems may still lack the required support for them to pass. I'll leave it up to you to decide whether they should be commented out or removed or whatever. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-31 14:17 Message: Logged In: YES user_id=21627 I agree that such a change should be added. Neal, you have given this patch more attention than I did - please check it in when you consider it complete. I just like to point out that it is missing documentation changes (libsocket.tex), a NEWS entry, and a test case. kuran, please provide those as a single patch file. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-31 01:11 Message: Logged In: YES user_id=33168 ISTM that in socket_inet_ntop() you need to verify the size of the packed value passed in. If the user passes an empty string, inet_ntop() could read beyond the buffer passed in, potentially causing a core dump. The checks could be something like this: if (af == AF_INET && len != sizeof(struct in_addr)) else if (af == AF_INET6 && len != sizeof(struct in6_addr)) Do this make sense? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-27 16:39 Message: Logged In: YES user_id=366566 The use case I have for it at the moment is a DNS server (Twisted.names). inet_pton allows me to handle IPv6 addresses, so it allows me to support AAAA and A6 records. I believe an IPv6 capable socks proxy would find this useful as well. Basically, low level network stuff. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-27 11:23 Message: Logged In: YES user_id=21627 What is the rationale for providing this functionality? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-26 19:32 Message: Logged In: YES user_id=366566 Ooops, I made two, and uploaded the wrong one >:O Sorry. Dunno if it's still helpful, but here's the unified diff. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-26 19:10 Message: Logged In: YES user_id=33168 Next time, please use context or unified diff. -c or -u option to cvs diff: cvs diff -c ... ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-24 22:05 Message: Logged In: YES user_id=366566 Sourceforge decided not to attach the file the first time... Here it is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 From noreply@sourceforge.net Mon Mar 3 10:59:25 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 02:59:25 -0800 Subject: [Patches] [ python-Patches-671384 ] test_pty hanging on hpux11 Message-ID: Patches item #671384, was opened at 2003-01-20 22:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Martin v. Löwis (loewis) Summary: test_pty hanging on hpux11 Initial Comment: The attached hack fixes a problem which occurs since switching the pty code. isatty() hangs if the slave_fd is closed and reopened as in the deprecated APIs pty.master_open() and pty.slave_open(). This patch reverts to the old behaviour where _open_terminal() is called in master_open() to avoid the hang later. Here's a very simple test for the problem: import pty, os master_fd, slave_name = pty.master_open() slave_fd = pty.slave_open(slave_name) print os.isatty(slave_fd) In slave_open() the first ioctl raises an IOError, Invalid Argument 22. I don't know if this problem affects hpux10. Hopefully someone will have a better idea how to really fix this problem. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:59 Message: Logged In: YES user_id=21627 I can't reproduce a test failure for Solaris 8 (on the SF compile farm) for Python 2.3a2. Can you please try that specific release and report what test fails for you, in which way? I'm concerned that the patch isn't that good, e.g. on Linux, it would cause usage of the old-style interface to pseudo-terminals, even though an all-singing all-dancing Unix98 pty support is available in the C library. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-28 01:21 Message: Logged In: YES user_id=33168 I have attached an updated patch. It seems Solaris 8 (on the snake farm) also had a test failure. I have basically restored the old functionality in this patch. _open_terminal is called if /dev/ptmx exists, so os.openpty() is not called. This fixes the test failures/hangs on both solaris and hpux and should be equivalent to the 2.2 behaviour. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 From noreply@sourceforge.net Mon Mar 3 11:23:01 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 03:23:01 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 11:36:06 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 03:36:06 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 11:37:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 03:37:39 -0800 Subject: [Patches] [ python-Patches-679505 ] Deprecate rotor module Message-ID: Patches item #679505, was opened at 2003-02-03 15:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=679505&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 4 Submitted By: A.M. Kuchling (akuchling) Assigned to: Nobody/Anonymous (nobody) Summary: Deprecate rotor module Initial Comment: Here's a trivial patch that marks the rotor module as deprecated. To be used if Paul Rubin's AES module goes into 2.3 (maybe even if it doesn't). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:37 Message: Logged In: YES user_id=21627 What is the rationale for deprecating the rotor module? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2003-02-03 15:04 Message: Logged In: YES user_id=11375 Attach patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=679505&group_id=5470 From noreply@sourceforge.net Mon Mar 3 11:39:07 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 03:39:07 -0800 Subject: [Patches] [ python-Patches-679505 ] Deprecate rotor module Message-ID: Patches item #679505, was opened at 2003-02-03 15:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=679505&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 4 Submitted By: A.M. Kuchling (akuchling) Assigned to: Nobody/Anonymous (nobody) Summary: Deprecate rotor module Initial Comment: Here's a trivial patch that marks the rotor module as deprecated. To be used if Paul Rubin's AES module goes into 2.3 (maybe even if it doesn't). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:39 Message: Logged In: YES user_id=21627 I retract my comment, the rationale is fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:37 Message: Logged In: YES user_id=21627 What is the rationale for deprecating the rotor module? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2003-02-03 15:04 Message: Logged In: YES user_id=11375 Attach patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=679505&group_id=5470 From noreply@sourceforge.net Mon Mar 3 12:22:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 04:22:32 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 12:57:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 04:57:58 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 21:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christian Tismer (tismer) >Assigned to: Guido van Rossum (gvanrossum) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Mon Mar 3 13:02:30 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 05:02:30 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 13:11:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 05:11:04 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 14:32:02 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 06:32:02 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 14:57:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 06:57:11 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 16:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Guido van Rossum (gvanrossum) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-03 09:57 Message: Logged In: YES user_id=6380 I'll look at this on Friday. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Mon Mar 3 15:03:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 07:03:28 -0800 Subject: [Patches] [ python-Patches-691928 ] Use datetime in _strptime Message-ID: Patches item #691928, was opened at 2003-02-23 18:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Use datetime in _strptime Initial Comment: To prevent code duplication, I patched _strptime to use datetime's date object to do Julian day, Gregorian, and day of the week calculations (Tim's code has to be more reliable than mine =). Patch also includes new regression tests to test results and calculation gets triggered. Very minor comment changes and my contact email are also changed. ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2003-03-03 09:03 Message: Logged In: YES user_id=44345 Meta comment - I think that when uploading successive patches it's useful to either name them differently or delete the prior one to avoid confusion. In this case it's not a big deal, especially since the submission dates are different, but after a few revisions it can sometimes be a challenge to figure out which patch should be downloaded. Comment comment - Unless there's some evidence the elided functions have been used, I suspect it best to just let people use the relevant datetime functions. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-02-25 15:51 Message: Logged In: YES user_id=357491 Only in the module (which was removed). None of the helper functions have ever been publicly advertised (although I think the locale date info might be helpful in locale; MvL wasn't interested, though). I uploaded a new diff that removes one more line that I forgot to remove when I eliminated the ability to pass in a regex object. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-23 18:56 Message: Logged In: YES user_id=33168 Brett, is there any doc for the functions that were removed? firstjulian, gregorian, julianday, dayofweek Otherwise, the patch seemed fine (but I didn't look that closely). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 From noreply@sourceforge.net Mon Mar 3 15:19:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 07:19:37 -0800 Subject: [Patches] [ python-Patches-696613 ] test options don't work on FreeBSD Message-ID: Patches item #696613, was opened at 2003-03-03 15:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ben Laurie (benl) Assigned to: Nobody/Anonymous (nobody) Summary: test options don't work on FreeBSD Initial Comment: test -L is used during make install - I'm guessing it is supposed to test for a softlink. Sadly, this is -h under FreeBSD, so the install fails. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 From noreply@sourceforge.net Mon Mar 3 15:27:24 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 07:27:24 -0800 Subject: [Patches] [ python-Patches-667730 ] More DictMixin Message-ID: Patches item #667730, was opened at 2003-01-14 14:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: More DictMixin Initial Comment: This patch is intended to provide a more consistent implementation for the various dictionary like objects of the standard library. test_userdict has been rewritten, it now use unittest and define a test-case wich allow to check for conformity with the dictionary protocol. test_shelve and test_weakref have been rewritten to use the test_userdict test-case. test_os has been extended: a new test case check for environ object conformity to the dictionary protocol. The patch modify the UserDict module: * The doc says that __contains__ should be one of the methods to redefine for better efficiency but the implementation make __contains__ dependent of has_key definition. The patch reverse methods dependencies. * Change iterkey = __iter__ to def iterkey(self): return self.__iter__() to make iterkey able to use overiden __iter__ methods. * I have also a added __init__, copy and __repr__ methods to DictMixin. * The UserDict.UserDict class is a subclass of DictMixin, this allow to simplify UserDict implementation. The patch is rather conservative since a lot of methods definition could still be removed from UserDict. In the weakref module, the patch make WeakValueDictionnary and WeakKeyDictionnary subclasses of UserDict.DictMixin. It also use nested scopes, the new generators syntax for iterator methods and rewrite WeakKeyDictionnary.__delitem__ . All of this allow to decrease the module size by 50%. In the shelve module, the patch add a copy() method which return a dictionary with the keys and values of the database. ---------------------------------------------------------------------- >Comment By: Sebastien Keim (s_keim) Date: 2003-03-03 16:27 Message: Logged In: YES user_id=498191 I have downloaded a new version of the patch updated to Python2.3a2 I hope to have removed all the stuff which could break backward compatibility since the new proposed patch contain now only the testing stuff (well, almost since I have also added a pop method to the weak dictionary classes to make them compatible with the test case). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-16 03:50 Message: Logged In: YES user_id=80475 Also, +1 on consolidating the test cases though it should be done after any other changes to the files so we can make sure that nothing got broken. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-16 03:35 Message: Logged In: YES user_id=80475 * UserDict.UserDict should not change. As Martin pointed- out, inheriting from object changes the semantics in a non- backward compatible way. Also, the class is efficiently implemented in terms an internal dictionary and would be slowed down by the nest of calls in Mixin. Also, I think the code in incorrect in defining __iter__, there was a reason it was pulled out into a separate subclass -- that was done in Py2.2. and is not an easily reversible decision. * -0 on the changes to has_key() and __contains__(). has_key() was put at a lower level than __contains__ because the older dict-style interfaces all define has_key. * +1 for changing iterkeys() to a full definition (and +1 for doing the same for __iter__()). Sabastien is correct is pointing out the advantages for propagating an overridden method. * -1 for altering repr() implementation. The current approach is shorter, cleaner, and faster. * -1 for adding __nonzero__(). Even dictionaries don't implement this method; they let len() do the talking. * -1 for adding __init__() and copy(). Both need to make assumptions about the order and number of parameters in the constructor of the class using the mixin. I think they are rarely helpful and are sometime harmful in introducing surprising, hard-to-find errors. People who need an init() or copy() can code them more cleanly and directly in the extending class. Also, I don't think the code is correct since DictMixin will be a base class, the use of super() is not what is wanted here -- *if* you were going to do this, try something like self.__class__(). Further, adding these methods violates my original intent for this class which was to extrapolate four basic mapping methods into a full mapping interface. It was not intended as a stand-alone class. Also, copy() cannot guarantee that it is copying all the relevant data for the sub-class and that violates the definition of what copy() is supposed to do. If something like this were attempted, it should be its own mixin (automatically adding copy support to any class) and it should be rather sophisticated about how to perfectly replicate itself (not easily done if the underlying data is in a file, database, or in a distributed app). * +0 on changing weakdicts provided it is done minimally and carefully with attention to leaving semantics unchanged and not slowing performance. The advantage goes beyond consistency, it removes code duplication, keeps well thought-out logic in one place, and provides an automatic interface update from DictMixin if the dictionary interface ever sprouts another method. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-14 22:43 Message: Logged In: YES user_id=21627 This patch breaks backwards compatibility. UserDict is an oldstyle class on purpose, since changing it to a newstyle class will certainly break the compatibility in subtle ways (e.g. by changing what type(userdictinstance) is). Unless you can bring forward a better rationale than consistency, this patch will be rejected. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 From noreply@sourceforge.net Mon Mar 3 15:48:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 07:48:09 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 15:49:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 07:49:49 -0800 Subject: [Patches] [ python-Patches-696645 ] VMS patches, cleaning part Message-ID: Patches item #696645, was opened at 2003-03-03 16:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696645&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Nobody/Anonymous (nobody) Summary: VMS patches, cleaning part Initial Comment: This is the cleaning patches. I will provide other patches in a separate item. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696645&group_id=5470 From noreply@sourceforge.net Mon Mar 3 15:51:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 07:51:32 -0800 Subject: [Patches] [ python-Patches-696645 ] VMS patches, cleaning part Message-ID: Patches item #696645, was opened at 2003-03-03 16:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696645&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) >Assigned to: Martin v. Löwis (loewis) Summary: VMS patches, cleaning part Initial Comment: This is the cleaning patches. I will provide other patches in a separate item. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696645&group_id=5470 From noreply@sourceforge.net Mon Mar 3 16:08:53 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 08:08:53 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 16:39:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 08:39:34 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 16:59:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 08:59:41 -0800 Subject: [Patches] [ python-Patches-667730 ] More DictMixin Message-ID: Patches item #667730, was opened at 2003-01-14 08:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) >Assigned to: Raymond Hettinger (rhettinger) Summary: More DictMixin Initial Comment: This patch is intended to provide a more consistent implementation for the various dictionary like objects of the standard library. test_userdict has been rewritten, it now use unittest and define a test-case wich allow to check for conformity with the dictionary protocol. test_shelve and test_weakref have been rewritten to use the test_userdict test-case. test_os has been extended: a new test case check for environ object conformity to the dictionary protocol. The patch modify the UserDict module: * The doc says that __contains__ should be one of the methods to redefine for better efficiency but the implementation make __contains__ dependent of has_key definition. The patch reverse methods dependencies. * Change iterkey = __iter__ to def iterkey(self): return self.__iter__() to make iterkey able to use overiden __iter__ methods. * I have also a added __init__, copy and __repr__ methods to DictMixin. * The UserDict.UserDict class is a subclass of DictMixin, this allow to simplify UserDict implementation. The patch is rather conservative since a lot of methods definition could still be removed from UserDict. In the weakref module, the patch make WeakValueDictionnary and WeakKeyDictionnary subclasses of UserDict.DictMixin. It also use nested scopes, the new generators syntax for iterator methods and rewrite WeakKeyDictionnary.__delitem__ . All of this allow to decrease the module size by 50%. In the shelve module, the patch add a copy() method which return a dictionary with the keys and values of the database. ---------------------------------------------------------------------- Comment By: Sebastien Keim (s_keim) Date: 2003-03-03 10:27 Message: Logged In: YES user_id=498191 I have downloaded a new version of the patch updated to Python2.3a2 I hope to have removed all the stuff which could break backward compatibility since the new proposed patch contain now only the testing stuff (well, almost since I have also added a pop method to the weak dictionary classes to make them compatible with the test case). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-15 21:50 Message: Logged In: YES user_id=80475 Also, +1 on consolidating the test cases though it should be done after any other changes to the files so we can make sure that nothing got broken. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-15 21:35 Message: Logged In: YES user_id=80475 * UserDict.UserDict should not change. As Martin pointed- out, inheriting from object changes the semantics in a non- backward compatible way. Also, the class is efficiently implemented in terms an internal dictionary and would be slowed down by the nest of calls in Mixin. Also, I think the code in incorrect in defining __iter__, there was a reason it was pulled out into a separate subclass -- that was done in Py2.2. and is not an easily reversible decision. * -0 on the changes to has_key() and __contains__(). has_key() was put at a lower level than __contains__ because the older dict-style interfaces all define has_key. * +1 for changing iterkeys() to a full definition (and +1 for doing the same for __iter__()). Sabastien is correct is pointing out the advantages for propagating an overridden method. * -1 for altering repr() implementation. The current approach is shorter, cleaner, and faster. * -1 for adding __nonzero__(). Even dictionaries don't implement this method; they let len() do the talking. * -1 for adding __init__() and copy(). Both need to make assumptions about the order and number of parameters in the constructor of the class using the mixin. I think they are rarely helpful and are sometime harmful in introducing surprising, hard-to-find errors. People who need an init() or copy() can code them more cleanly and directly in the extending class. Also, I don't think the code is correct since DictMixin will be a base class, the use of super() is not what is wanted here -- *if* you were going to do this, try something like self.__class__(). Further, adding these methods violates my original intent for this class which was to extrapolate four basic mapping methods into a full mapping interface. It was not intended as a stand-alone class. Also, copy() cannot guarantee that it is copying all the relevant data for the sub-class and that violates the definition of what copy() is supposed to do. If something like this were attempted, it should be its own mixin (automatically adding copy support to any class) and it should be rather sophisticated about how to perfectly replicate itself (not easily done if the underlying data is in a file, database, or in a distributed app). * +0 on changing weakdicts provided it is done minimally and carefully with attention to leaving semantics unchanged and not slowing performance. The advantage goes beyond consistency, it removes code duplication, keeps well thought-out logic in one place, and provides an automatic interface update from DictMixin if the dictionary interface ever sprouts another method. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-14 16:43 Message: Logged In: YES user_id=21627 This patch breaks backwards compatibility. UserDict is an oldstyle class on purpose, since changing it to a newstyle class will certainly break the compatibility in subtle ways (e.g. by changing what type(userdictinstance) is). Unless you can bring forward a better rationale than consistency, this patch will be rejected. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 From noreply@sourceforge.net Mon Mar 3 17:45:44 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 09:45:44 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Nobody/Anonymous (nobody) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 17:56:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 09:56:14 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Just van Rossum (jvr) >Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Mon Mar 3 19:59:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 11:59:54 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 16:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-03 19:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-02 02:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-01 02:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-01 01:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Mon Mar 3 21:14:53 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 13:14:53 -0800 Subject: [Patches] [ python-Patches-691928 ] Use datetime in _strptime Message-ID: Patches item #691928, was opened at 2003-02-23 16:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Use datetime in _strptime Initial Comment: To prevent code duplication, I patched _strptime to use datetime's date object to do Julian day, Gregorian, and day of the week calculations (Tim's code has to be more reliable than mine =). Patch also includes new regression tests to test results and calculation gets triggered. Very minor comment changes and my contact email are also changed. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2003-03-03 13:14 Message: Logged In: YES user_id=357491 Response to meta comment - I would normally delete it, Skip, but last time I tried I was told I didn't have the proper rights to do it. Unless SF has changed their setup to allow patch creators to manage the files regardless of whether they have CVS access I can't. Response to comment comment - The reason I am doing this is that I want to make sure that the returned time tuple is a valid date. If strptime is going to have default values I want those values to lead to a valid time that does not require someone to have to do more processing or wonder whether it is valid. Now currently the docs say you can't expect anything back in the time tuple but what was in the data string, so doing this does not go against the docs. But if strptime becomes the only strptime implementation, then I will write a doc patch to make the docs say that all returned time tuples will be valid dates. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2003-03-03 07:03 Message: Logged In: YES user_id=44345 Meta comment - I think that when uploading successive patches it's useful to either name them differently or delete the prior one to avoid confusion. In this case it's not a big deal, especially since the submission dates are different, but after a few revisions it can sometimes be a challenge to figure out which patch should be downloaded. Comment comment - Unless there's some evidence the elided functions have been used, I suspect it best to just let people use the relevant datetime functions. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-02-25 13:51 Message: Logged In: YES user_id=357491 Only in the module (which was removed). None of the helper functions have ever been publicly advertised (although I think the locale date info might be helpful in locale; MvL wasn't interested, though). I uploaded a new diff that removes one more line that I forgot to remove when I eliminated the ability to pass in a regex object. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-23 16:56 Message: Logged In: YES user_id=33168 Brett, is there any doc for the functions that were removed? firstjulian, gregorian, julianday, dayofweek Otherwise, the patch seemed fine (but I didn't look that closely). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 From noreply@sourceforge.net Mon Mar 3 22:23:22 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 14:23:22 -0800 Subject: [Patches] [ python-Patches-671384 ] test_pty hanging on hpux11 Message-ID: Patches item #671384, was opened at 2003-01-20 16:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Martin v. Löwis (loewis) Summary: test_pty hanging on hpux11 Initial Comment: The attached hack fixes a problem which occurs since switching the pty code. isatty() hangs if the slave_fd is closed and reopened as in the deprecated APIs pty.master_open() and pty.slave_open(). This patch reverts to the old behaviour where _open_terminal() is called in master_open() to avoid the hang later. Here's a very simple test for the problem: import pty, os master_fd, slave_name = pty.master_open() slave_fd = pty.slave_open(slave_name) print os.isatty(slave_fd) In slave_open() the first ioctl raises an IOError, Invalid Argument 22. I don't know if this problem affects hpux10. Hopefully someone will have a better idea how to really fix this problem. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 17:23 Message: Logged In: YES user_id=33168 I don't understand what you are asking for. By 'specific release', do you mean of Solaris/HP-UX? I believe on Solaris there's an exception, but on HP-UX it hangs. But I don't recall exactly. I agree this patch is not optimal. I can also try on our Solaris box here. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 05:59 Message: Logged In: YES user_id=21627 I can't reproduce a test failure for Solaris 8 (on the SF compile farm) for Python 2.3a2. Can you please try that specific release and report what test fails for you, in which way? I'm concerned that the patch isn't that good, e.g. on Linux, it would cause usage of the old-style interface to pseudo-terminals, even though an all-singing all-dancing Unix98 pty support is available in the C library. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-27 19:21 Message: Logged In: YES user_id=33168 I have attached an updated patch. It seems Solaris 8 (on the snake farm) also had a test failure. I have basically restored the old functionality in this patch. _open_terminal is called if /dev/ptmx exists, so os.openpty() is not called. This fixes the test failures/hangs on both solaris and hpux and should be equivalent to the 2.2 behaviour. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 From noreply@sourceforge.net Mon Mar 3 22:25:38 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 14:25:38 -0800 Subject: [Patches] [ python-Patches-658327 ] Add inet_pton and inet_ntop to socket Message-ID: Patches item #658327, was opened at 2002-12-24 16:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jp Calderone (kuran) >Assigned to: Neal Norwitz (nnorwitz) Summary: Add inet_pton and inet_ntop to socket Initial Comment: Patch is against current CVS and adds two socket module functions, inet_pton and inet_ntop. Both of these should be available on all platforms (because of other dependancies in the code) so I don't think portability is a problem. inet_ntop converts a packed IP address to a human-readable '.' or ':' separated string representation of the IP. inet_pton performs the reverse operation. (Potential) problems: inet_pton sets errno to ENOSPC, which may lead to a confusing error message. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 17:25 Message: Logged In: YES user_id=33168 As I recall, yes, has_ipv6 is only for tests. There was no way to distinguish if python was built with IPv6 support, since AF_INET6 was always defined. Your second approach sounds like it will work. I need to review the code, though. I've forgotten how it works. :-( ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 05:15 Message: Logged In: YES user_id=21627 The has_ipv6 test is only there for the tests? In that case, drop it, and just perform AF_INET6 conversions unconditionally. OTOH, I think we should not expose the emulated inet_pton: it doesn't set errno correctly, and offers no advantage over inet_addr. So wrap the entire code with HAVE_INET_PTON, and only perform the tests if the function is supported. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-04 21:40 Message: Logged In: YES user_id=33168 I was just about to check this in, but then I ran into a problem. IPv6 may not be enabled, even if the constant AF_INET6 exists. The cleanest way I saw to address this in the test was to add a has_ipv6 boolean constant to the socket module. Martin, do you think this is acceptable? Attached is a complete patch which should be safe (based on the discussion below), includes tests and doc changes. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2003-01-11 12:04 Message: Logged In: YES user_id=366566 Yea, testing for the proper input length is definitely something that should be done. The patch looks good, but for one thing. If the specified address family is neither AF_INET nor AF_INET6, the length won't be tested and the underlying inet_ntop will be called. This isn't a problem now (afaik) because only those two address families are support, but in a future libc version with more supported address families, it might open a similar hole to the one you've fixed. Perhaps the + } else { + PyErr_SetString(socket_error, "unknown address family"); + return NULL; + } should be moved up from the second if-grouping to follow the first if-grouping. Everything else looks good to me. Thanks for taking the time to look at this :) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-10 22:49 Message: Logged In: YES user_id=33168 JP, do you agree with my comment on 2002-12-30 about the checks? I have attached an updated patch. Please review and verify this is correct. Thank you for the additional tests. Feel free to submit patches with additional tests for any and all modules! ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-31 11:52 Message: Logged In: YES user_id=366566 Doc, NEWS, and test_socket patch attached. I didn't notice any inet_aton/inet_ntoa tests in the module so I added a couple for those as well (I excluded a test for inet_ntoa('255.255.255.255') ;) Also included are a couple IPv6 tests. I'm not sure if these are appropriate, since many systems may still lack the required support for them to pass. I'll leave it up to you to decide whether they should be commented out or removed or whatever. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-31 08:17 Message: Logged In: YES user_id=21627 I agree that such a change should be added. Neal, you have given this patch more attention than I did - please check it in when you consider it complete. I just like to point out that it is missing documentation changes (libsocket.tex), a NEWS entry, and a test case. kuran, please provide those as a single patch file. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-30 19:11 Message: Logged In: YES user_id=33168 ISTM that in socket_inet_ntop() you need to verify the size of the packed value passed in. If the user passes an empty string, inet_ntop() could read beyond the buffer passed in, potentially causing a core dump. The checks could be something like this: if (af == AF_INET && len != sizeof(struct in_addr)) else if (af == AF_INET6 && len != sizeof(struct in6_addr)) Do this make sense? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-27 10:39 Message: Logged In: YES user_id=366566 The use case I have for it at the moment is a DNS server (Twisted.names). inet_pton allows me to handle IPv6 addresses, so it allows me to support AAAA and A6 records. I believe an IPv6 capable socks proxy would find this useful as well. Basically, low level network stuff. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-27 05:23 Message: Logged In: YES user_id=21627 What is the rationale for providing this functionality? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-26 13:32 Message: Logged In: YES user_id=366566 Ooops, I made two, and uploaded the wrong one >:O Sorry. Dunno if it's still helpful, but here's the unified diff. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-26 13:10 Message: Logged In: YES user_id=33168 Next time, please use context or unified diff. -c or -u option to cvs diff: cvs diff -c ... ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-24 16:05 Message: Logged In: YES user_id=366566 Sourceforge decided not to attach the file the first time... Here it is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 From noreply@sourceforge.net Mon Mar 3 23:13:36 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 15:13:36 -0800 Subject: [Patches] [ python-Patches-671384 ] test_pty hanging on hpux11 Message-ID: Patches item #671384, was opened at 2003-01-20 22:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Martin v. Löwis (loewis) Summary: test_pty hanging on hpux11 Initial Comment: The attached hack fixes a problem which occurs since switching the pty code. isatty() hangs if the slave_fd is closed and reopened as in the deprecated APIs pty.master_open() and pty.slave_open(). This patch reverts to the old behaviour where _open_terminal() is called in master_open() to avoid the hang later. Here's a very simple test for the problem: import pty, os master_fd, slave_name = pty.master_open() slave_fd = pty.slave_open(slave_name) print os.isatty(slave_fd) In slave_open() the first ioctl raises an IOError, Invalid Argument 22. I don't know if this problem affects hpux10. Hopefully someone will have a better idea how to really fix this problem. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 00:13 Message: Logged In: YES user_id=21627 By 'specific release', I mean Python 2.3a2, with no patches. I can't reproduce an exception on that Python release, for Solaris 8. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 23:23 Message: Logged In: YES user_id=33168 I don't understand what you are asking for. By 'specific release', do you mean of Solaris/HP-UX? I believe on Solaris there's an exception, but on HP-UX it hangs. But I don't recall exactly. I agree this patch is not optimal. I can also try on our Solaris box here. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:59 Message: Logged In: YES user_id=21627 I can't reproduce a test failure for Solaris 8 (on the SF compile farm) for Python 2.3a2. Can you please try that specific release and report what test fails for you, in which way? I'm concerned that the patch isn't that good, e.g. on Linux, it would cause usage of the old-style interface to pseudo-terminals, even though an all-singing all-dancing Unix98 pty support is available in the C library. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-28 01:21 Message: Logged In: YES user_id=33168 I have attached an updated patch. It seems Solaris 8 (on the snake farm) also had a test failure. I have basically restored the old functionality in this patch. _open_terminal is called if /dev/ptmx exists, so os.openpty() is not called. This fixes the test failures/hangs on both solaris and hpux and should be equivalent to the 2.2 behaviour. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 From noreply@sourceforge.net Tue Mar 4 03:41:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 19:41:21 -0800 Subject: [Patches] [ python-Patches-658327 ] Add inet_pton and inet_ntop to socket Message-ID: Patches item #658327, was opened at 2002-12-24 16:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jp Calderone (kuran) >Assigned to: Martin v. Löwis (loewis) Summary: Add inet_pton and inet_ntop to socket Initial Comment: Patch is against current CVS and adds two socket module functions, inet_pton and inet_ntop. Both of these should be available on all platforms (because of other dependancies in the code) so I don't think portability is a problem. inet_ntop converts a packed IP address to a human-readable '.' or ':' separated string representation of the IP. inet_pton performs the reverse operation. (Potential) problems: inet_pton sets errno to ENOSPC, which may lead to a confusing error message. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 22:41 Message: Logged In: YES user_id=33168 I added the #ifdef, but that doesn't address the testing problem. If the platform has inet_pton, but doesn't have IPv6 ENABLED. The inet_pton will be exported, but there's no good way to tell if you can pass an IPv6 address. The only way to test if IPv6 is enabled would be to call inet_pton with AF_INET6, catch a socket.error and check if the exception message is "unknown address family". Since this is really a testing issue, perhaps that's best after all? Do you agree this should be done? * Remove has_ipv6 * Export inet_pton & inet_ntop only if defined for platform * Only try to test inet_pton/ntop if defined for platform * Modify the tests to pass a valid IPv6 test, catch socket.error, if the error message is "unknown address family", don't test ipv6 any further, if the error message is different, raise TestFailed, if no exception, test all IPv6 addresses ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 17:25 Message: Logged In: YES user_id=33168 As I recall, yes, has_ipv6 is only for tests. There was no way to distinguish if python was built with IPv6 support, since AF_INET6 was always defined. Your second approach sounds like it will work. I need to review the code, though. I've forgotten how it works. :-( ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 05:15 Message: Logged In: YES user_id=21627 The has_ipv6 test is only there for the tests? In that case, drop it, and just perform AF_INET6 conversions unconditionally. OTOH, I think we should not expose the emulated inet_pton: it doesn't set errno correctly, and offers no advantage over inet_addr. So wrap the entire code with HAVE_INET_PTON, and only perform the tests if the function is supported. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-04 21:40 Message: Logged In: YES user_id=33168 I was just about to check this in, but then I ran into a problem. IPv6 may not be enabled, even if the constant AF_INET6 exists. The cleanest way I saw to address this in the test was to add a has_ipv6 boolean constant to the socket module. Martin, do you think this is acceptable? Attached is a complete patch which should be safe (based on the discussion below), includes tests and doc changes. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2003-01-11 12:04 Message: Logged In: YES user_id=366566 Yea, testing for the proper input length is definitely something that should be done. The patch looks good, but for one thing. If the specified address family is neither AF_INET nor AF_INET6, the length won't be tested and the underlying inet_ntop will be called. This isn't a problem now (afaik) because only those two address families are support, but in a future libc version with more supported address families, it might open a similar hole to the one you've fixed. Perhaps the + } else { + PyErr_SetString(socket_error, "unknown address family"); + return NULL; + } should be moved up from the second if-grouping to follow the first if-grouping. Everything else looks good to me. Thanks for taking the time to look at this :) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-10 22:49 Message: Logged In: YES user_id=33168 JP, do you agree with my comment on 2002-12-30 about the checks? I have attached an updated patch. Please review and verify this is correct. Thank you for the additional tests. Feel free to submit patches with additional tests for any and all modules! ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-31 11:52 Message: Logged In: YES user_id=366566 Doc, NEWS, and test_socket patch attached. I didn't notice any inet_aton/inet_ntoa tests in the module so I added a couple for those as well (I excluded a test for inet_ntoa('255.255.255.255') ;) Also included are a couple IPv6 tests. I'm not sure if these are appropriate, since many systems may still lack the required support for them to pass. I'll leave it up to you to decide whether they should be commented out or removed or whatever. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-31 08:17 Message: Logged In: YES user_id=21627 I agree that such a change should be added. Neal, you have given this patch more attention than I did - please check it in when you consider it complete. I just like to point out that it is missing documentation changes (libsocket.tex), a NEWS entry, and a test case. kuran, please provide those as a single patch file. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-30 19:11 Message: Logged In: YES user_id=33168 ISTM that in socket_inet_ntop() you need to verify the size of the packed value passed in. If the user passes an empty string, inet_ntop() could read beyond the buffer passed in, potentially causing a core dump. The checks could be something like this: if (af == AF_INET && len != sizeof(struct in_addr)) else if (af == AF_INET6 && len != sizeof(struct in6_addr)) Do this make sense? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-27 10:39 Message: Logged In: YES user_id=366566 The use case I have for it at the moment is a DNS server (Twisted.names). inet_pton allows me to handle IPv6 addresses, so it allows me to support AAAA and A6 records. I believe an IPv6 capable socks proxy would find this useful as well. Basically, low level network stuff. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-27 05:23 Message: Logged In: YES user_id=21627 What is the rationale for providing this functionality? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-26 13:32 Message: Logged In: YES user_id=366566 Ooops, I made two, and uploaded the wrong one >:O Sorry. Dunno if it's still helpful, but here's the unified diff. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-26 13:10 Message: Logged In: YES user_id=33168 Next time, please use context or unified diff. -c or -u option to cvs diff: cvs diff -c ... ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-24 16:05 Message: Logged In: YES user_id=366566 Sourceforge decided not to attach the file the first time... Here it is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 From noreply@sourceforge.net Tue Mar 4 04:03:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 20:03:13 -0800 Subject: [Patches] [ python-Patches-696613 ] test options don't work on FreeBSD Message-ID: Patches item #696613, was opened at 2003-03-03 10:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ben Laurie (benl) >Assigned to: Jack Jansen (jackjansen) Summary: test options don't work on FreeBSD Initial Comment: test -L is used during make install - I'm guessing it is supposed to test for a softlink. Sadly, this is -h under FreeBSD, so the install fails. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 23:03 Message: Logged In: YES user_id=33168 What version of FreeBSD? I'm on 4.7 (SF compile farm), and the man page says: -h file True if file exists and is a symbolic link. This operator is retained for compatibility with previous versions of this program. Do not rely on its existence; use -L instead. I tested -h on Linux, HPUX11, and Solaris 8. -h and -L both work fine. Assigning to Jack, since he checked in this code. I wonder if there's any issue on the Mac? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 From noreply@sourceforge.net Tue Mar 4 04:26:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 20:26:14 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-03 23:26 Message: Logged In: YES user_id=80475 For NEWS, add a new entry (so that it documents a difference from Py2.3a2). For whatsnew23, modify the existing entry (since it is a delta from Py2.3). ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-03 14:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-01 21:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-28 20:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Tue Mar 4 06:49:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 22:49:19 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 07:05:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 03 Mar 2003 23:05:11 -0800 Subject: [Patches] [ python-Patches-658327 ] Add inet_pton and inet_ntop to socket Message-ID: Patches item #658327, was opened at 2002-12-24 22:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jp Calderone (kuran) Assigned to: Martin v. Löwis (loewis) Summary: Add inet_pton and inet_ntop to socket Initial Comment: Patch is against current CVS and adds two socket module functions, inet_pton and inet_ntop. Both of these should be available on all platforms (because of other dependancies in the code) so I don't think portability is a problem. inet_ntop converts a packed IP address to a human-readable '.' or ':' separated string representation of the IP. inet_pton performs the reverse operation. (Potential) problems: inet_pton sets errno to ENOSPC, which may lead to a confusing error message. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 08:05 Message: Logged In: YES user_id=21627 My two suggestions aren't exclusive: If you have the native inet_pton, you can *always* support IPv6 addresses with that, regardless of whether --enable-ipv6 was passed to configure or not. If that is done, it will be a legitime test failure for inet_pton not to support IPv6 - after all, the primary reason to define this function was to support IPv6, so if the native function fails to do so, there is clearly a bug in the system. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-04 04:41 Message: Logged In: YES user_id=33168 I added the #ifdef, but that doesn't address the testing problem. If the platform has inet_pton, but doesn't have IPv6 ENABLED. The inet_pton will be exported, but there's no good way to tell if you can pass an IPv6 address. The only way to test if IPv6 is enabled would be to call inet_pton with AF_INET6, catch a socket.error and check if the exception message is "unknown address family". Since this is really a testing issue, perhaps that's best after all? Do you agree this should be done? * Remove has_ipv6 * Export inet_pton & inet_ntop only if defined for platform * Only try to test inet_pton/ntop if defined for platform * Modify the tests to pass a valid IPv6 test, catch socket.error, if the error message is "unknown address family", don't test ipv6 any further, if the error message is different, raise TestFailed, if no exception, test all IPv6 addresses ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 23:25 Message: Logged In: YES user_id=33168 As I recall, yes, has_ipv6 is only for tests. There was no way to distinguish if python was built with IPv6 support, since AF_INET6 was always defined. Your second approach sounds like it will work. I need to review the code, though. I've forgotten how it works. :-( ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:15 Message: Logged In: YES user_id=21627 The has_ipv6 test is only there for the tests? In that case, drop it, and just perform AF_INET6 conversions unconditionally. OTOH, I think we should not expose the emulated inet_pton: it doesn't set errno correctly, and offers no advantage over inet_addr. So wrap the entire code with HAVE_INET_PTON, and only perform the tests if the function is supported. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-05 03:40 Message: Logged In: YES user_id=33168 I was just about to check this in, but then I ran into a problem. IPv6 may not be enabled, even if the constant AF_INET6 exists. The cleanest way I saw to address this in the test was to add a has_ipv6 boolean constant to the socket module. Martin, do you think this is acceptable? Attached is a complete patch which should be safe (based on the discussion below), includes tests and doc changes. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2003-01-11 18:04 Message: Logged In: YES user_id=366566 Yea, testing for the proper input length is definitely something that should be done. The patch looks good, but for one thing. If the specified address family is neither AF_INET nor AF_INET6, the length won't be tested and the underlying inet_ntop will be called. This isn't a problem now (afaik) because only those two address families are support, but in a future libc version with more supported address families, it might open a similar hole to the one you've fixed. Perhaps the + } else { + PyErr_SetString(socket_error, "unknown address family"); + return NULL; + } should be moved up from the second if-grouping to follow the first if-grouping. Everything else looks good to me. Thanks for taking the time to look at this :) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-11 04:49 Message: Logged In: YES user_id=33168 JP, do you agree with my comment on 2002-12-30 about the checks? I have attached an updated patch. Please review and verify this is correct. Thank you for the additional tests. Feel free to submit patches with additional tests for any and all modules! ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-31 17:52 Message: Logged In: YES user_id=366566 Doc, NEWS, and test_socket patch attached. I didn't notice any inet_aton/inet_ntoa tests in the module so I added a couple for those as well (I excluded a test for inet_ntoa('255.255.255.255') ;) Also included are a couple IPv6 tests. I'm not sure if these are appropriate, since many systems may still lack the required support for them to pass. I'll leave it up to you to decide whether they should be commented out or removed or whatever. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-31 14:17 Message: Logged In: YES user_id=21627 I agree that such a change should be added. Neal, you have given this patch more attention than I did - please check it in when you consider it complete. I just like to point out that it is missing documentation changes (libsocket.tex), a NEWS entry, and a test case. kuran, please provide those as a single patch file. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-31 01:11 Message: Logged In: YES user_id=33168 ISTM that in socket_inet_ntop() you need to verify the size of the packed value passed in. If the user passes an empty string, inet_ntop() could read beyond the buffer passed in, potentially causing a core dump. The checks could be something like this: if (af == AF_INET && len != sizeof(struct in_addr)) else if (af == AF_INET6 && len != sizeof(struct in6_addr)) Do this make sense? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-27 16:39 Message: Logged In: YES user_id=366566 The use case I have for it at the moment is a DNS server (Twisted.names). inet_pton allows me to handle IPv6 addresses, so it allows me to support AAAA and A6 records. I believe an IPv6 capable socks proxy would find this useful as well. Basically, low level network stuff. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-27 11:23 Message: Logged In: YES user_id=21627 What is the rationale for providing this functionality? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-26 19:32 Message: Logged In: YES user_id=366566 Ooops, I made two, and uploaded the wrong one >:O Sorry. Dunno if it's still helpful, but here's the unified diff. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-26 19:10 Message: Logged In: YES user_id=33168 Next time, please use context or unified diff. -c or -u option to cvs diff: cvs diff -c ... ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-24 22:05 Message: Logged In: YES user_id=366566 Sourceforge decided not to attach the file the first time... Here it is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 From noreply@sourceforge.net Tue Mar 4 08:01:56 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 00:01:56 -0800 Subject: [Patches] [ python-Patches-681780 ] Faster commonprefix (OS independent) Message-ID: Patches item #681780, was opened at 2003-02-06 18:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681780&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Nobody/Anonymous (nobody) Summary: Faster commonprefix (OS independent) Initial Comment: This routine is about 20% faster on a test set of 7 sets of strings run 100000 times each (I can provide the test if requested). The longer the common prefix is, the faster the routine becomes relative to original commonprefix. My only worry is that it might get rejected if it is considered too fancy; therefore I wasn't shy on commenting. I think we should also write a commonpathprefix, that will do what commonprefix should do, being in the *path.py module. I'll do that if none other does. The provided patch is for posixpath.py and ntpath.py, but since it's OS neutral it should work as is. It uses itertools for speed, though, so it is not backportable, but it can be if requested by substituting map for imap and a normal slice for islice. ---------------------------------------------------------------------- Comment By: Sebastien Keim (s_keim) Date: 2003-03-04 09:01 Message: Logged In: YES user_id=498191 I would suggest another possibility. This one use a property of strings ordering: if you have a<=b<=c and c.startswith(a) then b.startswith(a). I have tested two implementations : # a 5 lines function with a really straightforward code. # It can degenerate rather badly in the worst case (large strings # with a short common prefix) but is generally quite fast. def commonprefix1(m): if not m: return '' prefix, greater = min(m), max(m) while not greater.startswith(prefix): prefix = prefix[:-1] return prefix # The second use a bissection to avoid the worst case. This make # the implementation a little more complex but seems to provide the # fastest result. def commonprefix2(m): prefix = '' if m: low, high = min(m), max(m) while low: n = len(low)//2 + 1 l, h = low[:n], high[:n] if h==l: prefix += l low, high = low[n:], high[n:] else: low, high = l[:-1], h return prefix I personally prefer the commonprefix1 implementation: its the simplest one and it is probably fast enough for the few commonprefix use-cases (anyway, it is still faster than the current implementation). ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-02-07 12:11 Message: Logged In: YES user_id=539787 I did my homework better, and found out that the buffer object quite probably will be deprecated. So I rewrote the routine without the buffer object (using str.startswith), which by the way got another 10% speedup (relative to the latest version using buffer.) The commonprefix_nobuf.diff patch applies directly to the original posixpath.py, ntpath.py. I will try to delete the other patches, but I don't think I am allowed to do it. ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-02-06 19:02 Message: Logged In: YES user_id=539787 Best case: comparing this to the old version with a list: ['/usr/local/lib/python2.3/posixpath.py']*120, 10000 iterations, the speed difference is: old: 319.58 sec new: 34.43 sec Since prefix_len always grows in the "while next_bit:" loop, applying commonprefix2.diff to the *patched* version does a very minor speedup (comparing smaller buffers in every iteration); but it is only a matter of overoptimisation (ie it does not hurt, but it's a trivial one, just 0.1%). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-06 18:25 Message: Logged In: YES user_id=33168 As much as I'd like to blame IE, it's a SF bug AFAIK. http://sf.net/tracker/?func=detail&atid=200001&aid=675910&group_id=1 ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-02-06 18:04 Message: Logged In: YES user_id=539787 For some reason, my IE never uploads the file on the first attempt. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681780&group_id=5470 From noreply@sourceforge.net Tue Mar 4 11:04:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 03:04:32 -0800 Subject: [Patches] [ python-Patches-696613 ] test options don't work on FreeBSD Message-ID: Patches item #696613, was opened at 2003-03-03 16:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 Category: Build Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Ben Laurie (benl) Assigned to: Jack Jansen (jackjansen) Summary: test options don't work on FreeBSD Initial Comment: test -L is used during make install - I'm guessing it is supposed to test for a softlink. Sadly, this is -h under FreeBSD, so the install fails. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 12:04 Message: Logged In: YES user_id=45365 Checked in as Makefile.pre.in 1.116. Now let's hope there's no platforms out there that only have -h and not -L, but if that is so then it should become clear when 2.3b1 hits the street. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-04 05:03 Message: Logged In: YES user_id=33168 What version of FreeBSD? I'm on 4.7 (SF compile farm), and the man page says: -h file True if file exists and is a symbolic link. This operator is retained for compatibility with previous versions of this program. Do not rely on its existence; use -L instead. I tested -h on Linux, HPUX11, and Solaris 8. -h and -L both work fine. Assigning to Jack, since he checked in this code. I wonder if there's any issue on the Mac? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 From noreply@sourceforge.net Tue Mar 4 12:32:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 04:32:04 -0800 Subject: [Patches] [ python-Patches-696613 ] test options don't work on FreeBSD Message-ID: Patches item #696613, was opened at 2003-03-03 15:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 Category: Build Group: Python 2.3 Status: Closed Resolution: Accepted Priority: 5 Submitted By: Ben Laurie (benl) Assigned to: Jack Jansen (jackjansen) Summary: test options don't work on FreeBSD Initial Comment: test -L is used during make install - I'm guessing it is supposed to test for a softlink. Sadly, this is -h under FreeBSD, so the install fails. ---------------------------------------------------------------------- >Comment By: Ben Laurie (benl) Date: 2003-03-04 12:32 Message: Logged In: YES user_id=14333 As always, its coz I'm running an ancient version of FreeBSD. Perhaps its time I built a new machine :-) Mine is 3.2! ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 11:04 Message: Logged In: YES user_id=45365 Checked in as Makefile.pre.in 1.116. Now let's hope there's no platforms out there that only have -h and not -L, but if that is so then it should become clear when 2.3b1 hits the street. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-04 04:03 Message: Logged In: YES user_id=33168 What version of FreeBSD? I'm on 4.7 (SF compile farm), and the man page says: -h file True if file exists and is a symbolic link. This operator is retained for compatibility with previous versions of this program. Do not rely on its existence; use -L instead. I tested -h on Linux, HPUX11, and Solaris 8. -h and -L both work fine. Assigning to Jack, since he checked in this code. I wonder if there's any issue on the Mac? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696613&group_id=5470 From noreply@sourceforge.net Tue Mar 4 14:01:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 06:01:18 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 16:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 01:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 11:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 10:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 09:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 08:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 08:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 07:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 06:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 06:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 12:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 10:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 13:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 06:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 06:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 05:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 04:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-09 22:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-09 20:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 14:19:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 06:19:21 -0800 Subject: [Patches] [ python-Patches-671384 ] test_pty hanging on hpux11 Message-ID: Patches item #671384, was opened at 2003-01-20 16:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 Category: Modules Group: Python 2.3 >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: test_pty hanging on hpux11 Initial Comment: The attached hack fixes a problem which occurs since switching the pty code. isatty() hangs if the slave_fd is closed and reopened as in the deprecated APIs pty.master_open() and pty.slave_open(). This patch reverts to the old behaviour where _open_terminal() is called in master_open() to avoid the hang later. Here's a very simple test for the problem: import pty, os master_fd, slave_name = pty.master_open() slave_fd = pty.slave_open(slave_name) print os.isatty(slave_fd) In slave_open() the first ioctl raises an IOError, Invalid Argument 22. I don't know if this problem affects hpux10. Hopefully someone will have a better idea how to really fix this problem. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-04 09:19 Message: Logged In: YES user_id=33168 I don't seem to have this problem any more on either HP-UX or Solaris. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 18:13 Message: Logged In: YES user_id=21627 By 'specific release', I mean Python 2.3a2, with no patches. I can't reproduce an exception on that Python release, for Solaris 8. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 17:23 Message: Logged In: YES user_id=33168 I don't understand what you are asking for. By 'specific release', do you mean of Solaris/HP-UX? I believe on Solaris there's an exception, but on HP-UX it hangs. But I don't recall exactly. I agree this patch is not optimal. I can also try on our Solaris box here. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 05:59 Message: Logged In: YES user_id=21627 I can't reproduce a test failure for Solaris 8 (on the SF compile farm) for Python 2.3a2. Can you please try that specific release and report what test fails for you, in which way? I'm concerned that the patch isn't that good, e.g. on Linux, it would cause usage of the old-style interface to pseudo-terminals, even though an all-singing all-dancing Unix98 pty support is available in the C library. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-27 19:21 Message: Logged In: YES user_id=33168 I have attached an updated patch. It seems Solaris 8 (on the snake farm) also had a test failure. I have basically restored the old functionality in this patch. _open_terminal is called if /dev/ptmx exists, so os.openpty() is not called. This fixes the test failures/hangs on both solaris and hpux and should be equivalent to the 2.2 behaviour. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=671384&group_id=5470 From noreply@sourceforge.net Tue Mar 4 14:31:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 06:31:41 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 14:40:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 06:40:39 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 14:51:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 06:51:28 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 14:54:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 06:54:19 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 16:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 09:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 09:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 09:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 01:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 11:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 10:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 09:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 08:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 08:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 07:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 06:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 06:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 12:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 10:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 13:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 06:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 06:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 05:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 04:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-09 22:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-09 20:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:03:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:03:31 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 16:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 10:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 09:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 09:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 09:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 01:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 11:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 10:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 09:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 08:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 08:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 07:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 06:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 06:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 12:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 10:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 13:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 06:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 06:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 05:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 04:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-09 22:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-09 20:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:07:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:07:37 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:11:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:11:31 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:15:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:15:14 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:15 Message: Logged In: YES user_id=21627 Setting the file system encoding on startup should be fine, except that we need another setlocale/query/restore locale sequence. This is, in principle, bad, as there is no guarantee that the restore locale operation really produces the original state, and may cause problems if other threads are already running. In practice, it appears to work out just fine, as we use such sequences already (e.g. to undo the readline initialization). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:19:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:19:54 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 10:19 Message: Logged In: YES user_id=6380 You don't need to update whatsnew23.tex; its editor prefers to do this himself. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-03 23:26 Message: Logged In: YES user_id=80475 For NEWS, add a new entry (so that it documents a difference from Py2.3a2). For whatsnew23, modify the existing entry (since it is a delta from Py2.3). ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-03 14:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-01 21:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-28 20:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:44:40 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:44:40 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 16:44 Message: Logged In: YES user_id=45365 I just did a test (created 254 files with all bytes except / and null in their names on a linux server, mounted the partition over NFS on MacOSX) and indeed MacOSX tries to interpret the bytes as UTF-8 and fails. I know that conversion works for HFS and HFS+ volumes (which carry a filename encoding with them, or you have to specify it when mounting). I assume it works for AFP and SMB (which also carries encoding info, IIRC) but I can't test this. I haven't a clue about webdav and such. Something to keep in mind is that we are really trying to solve someone else's problem: the inability of NFS and most unixen to handle file system encodings. If I'm on a latin-1 machine and I nfs-mount your latin-2 partition I will see garbage filenames. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:15 Message: Logged In: YES user_id=21627 Setting the file system encoding on startup should be fine, except that we need another setlocale/query/restore locale sequence. This is, in principle, bad, as there is no guarantee that the restore locale operation really produces the original state, and may cause problems if other threads are already running. In practice, it appears to work out just fine, as we use such sequences already (e.g. to undo the readline initialization). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 15:50:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 07:50:18 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:50 Message: Logged In: YES user_id=92689 Here's a note about file system encodings on OSX, including a few words about NFS: http://developer.apple.com/qa/qa2001/qa1173.html. I propose to fall back to a byte string if conversion to unicode fails. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 16:44 Message: Logged In: YES user_id=45365 I just did a test (created 254 files with all bytes except / and null in their names on a linux server, mounted the partition over NFS on MacOSX) and indeed MacOSX tries to interpret the bytes as UTF-8 and fails. I know that conversion works for HFS and HFS+ volumes (which carry a filename encoding with them, or you have to specify it when mounting). I assume it works for AFP and SMB (which also carries encoding info, IIRC) but I can't test this. I haven't a clue about webdav and such. Something to keep in mind is that we are really trying to solve someone else's problem: the inability of NFS and most unixen to handle file system encodings. If I'm on a latin-1 machine and I nfs-mount your latin-2 partition I will see garbage filenames. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:15 Message: Logged In: YES user_id=21627 Setting the file system encoding on startup should be fine, except that we need another setlocale/query/restore locale sequence. This is, in principle, bad, as there is no guarantee that the restore locale operation really produces the original state, and may cause problems if other threads are already running. In practice, it appears to work out just fine, as we use such sequences already (e.g. to undo the readline initialization). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 16:00:35 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 08:00:35 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 17:00 Message: Logged In: YES user_id=21627 I only partially agree that this is somebody else's problem: On Unix, it is always considered application responsibility to interpret file names as characters if they need to - hence the lack of a system-provided encoding strategy. So it is the problem of Python or the Python application, and I think we should try to shield the application from these issues as good as we can. Therefore, I'm in favour of jvr's latest proposal (use byte strings as the last resort), hoping that the error case will be unfrequent. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:50 Message: Logged In: YES user_id=92689 Here's a note about file system encodings on OSX, including a few words about NFS: http://developer.apple.com/qa/qa2001/qa1173.html. I propose to fall back to a byte string if conversion to unicode fails. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 16:44 Message: Logged In: YES user_id=45365 I just did a test (created 254 files with all bytes except / and null in their names on a linux server, mounted the partition over NFS on MacOSX) and indeed MacOSX tries to interpret the bytes as UTF-8 and fails. I know that conversion works for HFS and HFS+ volumes (which carry a filename encoding with them, or you have to specify it when mounting). I assume it works for AFP and SMB (which also carries encoding info, IIRC) but I can't test this. I haven't a clue about webdav and such. Something to keep in mind is that we are really trying to solve someone else's problem: the inability of NFS and most unixen to handle file system encodings. If I'm on a latin-1 machine and I nfs-mount your latin-2 partition I will see garbage filenames. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:15 Message: Logged In: YES user_id=21627 Setting the file system encoding on startup should be fine, except that we need another setlocale/query/restore locale sequence. This is, in principle, bad, as there is no guarantee that the restore locale operation really produces the original state, and may cause problems if other threads are already running. In practice, it appears to work out just fine, as we use such sequences already (e.g. to undo the readline initialization). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 16:26:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 08:26:58 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 16:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 11:26 Message: Logged In: YES user_id=6380 On the one hand a user who isn't interested in encodings shouldn't be passing a Unicode argument. On the other hand, Unicode strings have a way of sneaking into your application when you least suspect them. E.g. Tkinter returns them, so does IDLE, and I see them used more and more in Zope 3. FWIW, I like Just's "fall back to bytestrings" aproach. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 11:00 Message: Logged In: YES user_id=21627 I only partially agree that this is somebody else's problem: On Unix, it is always considered application responsibility to interpret file names as characters if they need to - hence the lack of a system-provided encoding strategy. So it is the problem of Python or the Python application, and I think we should try to shield the application from these issues as good as we can. Therefore, I'm in favour of jvr's latest proposal (use byte strings as the last resort), hoping that the error case will be unfrequent. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 10:50 Message: Logged In: YES user_id=92689 Here's a note about file system encodings on OSX, including a few words about NFS: http://developer.apple.com/qa/qa2001/qa1173.html. I propose to fall back to a byte string if conversion to unicode fails. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 10:44 Message: Logged In: YES user_id=45365 I just did a test (created 254 files with all bytes except / and null in their names on a linux server, mounted the partition over NFS on MacOSX) and indeed MacOSX tries to interpret the bytes as UTF-8 and fails. I know that conversion works for HFS and HFS+ volumes (which carry a filename encoding with them, or you have to specify it when mounting). I assume it works for AFP and SMB (which also carries encoding info, IIRC) but I can't test this. I haven't a clue about webdav and such. Something to keep in mind is that we are really trying to solve someone else's problem: the inability of NFS and most unixen to handle file system encodings. If I'm on a latin-1 machine and I nfs-mount your latin-2 partition I will see garbage filenames. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 10:15 Message: Logged In: YES user_id=21627 Setting the file system encoding on startup should be fine, except that we need another setlocale/query/restore locale sequence. This is, in principle, bad, as there is no guarantee that the restore locale operation really produces the original state, and may cause problems if other threads are already running. In practice, it appears to work out just fine, as we use such sequences already (e.g. to undo the readline initialization). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 10:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 10:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 10:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 09:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 09:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 09:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 09:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 01:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 12:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 11:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 10:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 09:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 08:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 08:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 07:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 06:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 06:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 12:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 10:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 13:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 06:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 06:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 05:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 05:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 04:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 04:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-09 22:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-09 20:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 17:19:38 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 09:19:38 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 16:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-04 17:19 Message: Logged In: YES user_id=670441 Okay, here's patchpop2 with the diff'ed dictobject, UserDict, test_types, test_userdict, NEWS, and Doc/lib/libstdtypes. whew. Let me know if you need any changes. The change to DictMixin seems a bit clumsy, but I liked it better than other things I came up with. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:19 Message: Logged In: YES user_id=6380 You don't need to update whatsnew23.tex; its editor prefers to do this himself. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-04 04:26 Message: Logged In: YES user_id=80475 For NEWS, add a new entry (so that it documents a difference from Py2.3a2). For whatsnew23, modify the existing entry (since it is a delta from Py2.3). ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-03 19:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-02 02:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-01 02:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-01 01:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Tue Mar 4 17:44:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 09:44:28 -0800 Subject: [Patches] [ python-Patches-684256 ] AutoThreadState implementation Message-ID: Patches item #684256, was opened at 2003-02-10 14:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=684256&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: AutoThreadState implementation Initial Comment: An implementation of the AutoThreadState API, mainly for discussion purposes at this point. To be a PEP soon. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2003-03-04 08:44 Message: Logged In: YES user_id=86307 It appears to me that PyAutoThreadState_Release calls PyThreadState_Clear after releasing the GIL (if the thread state was created by PyAutoThreadState_Ensure, then old state will be UNLOCKED, so PyEval_ReleaseThread will be called). It looks to me that, if the thread state is going to be deleted, the call to Clear it should be moved up to just before ReleaseThread, i.e.: if (oldstate == PyAutoThreadState_UNLOCKED) { if (tcur->autothreadstate_counter == 1) PyThreadState_Clear(tcur); PyEval_ReleaseThread(tcur); } ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-13 04:05 Message: Logged In: YES user_id=14198 Attaching a new patch that works perfectly. 2 checks remain in the code that will be debug only, but apart from that, it is pretty good. No changes at all to existing semantics. Tested on Linux and Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=684256&group_id=5470 From noreply@sourceforge.net Tue Mar 4 18:55:26 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 10:55:26 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 16:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-04 18:55 Message: Logged In: YES user_id=670441 argh... I put the NEWS item in the wrong place. Ignore patchpop2(I can't delete it), look at patchpop3. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-04 17:19 Message: Logged In: YES user_id=670441 Okay, here's patchpop2 with the diff'ed dictobject, UserDict, test_types, test_userdict, NEWS, and Doc/lib/libstdtypes. whew. Let me know if you need any changes. The change to DictMixin seems a bit clumsy, but I liked it better than other things I came up with. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:19 Message: Logged In: YES user_id=6380 You don't need to update whatsnew23.tex; its editor prefers to do this himself. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-04 04:26 Message: Logged In: YES user_id=80475 For NEWS, add a new entry (so that it documents a difference from Py2.3a2). For whatsnew23, modify the existing entry (since it is a delta from Py2.3). ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-03 19:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-02 02:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-01 02:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-01 01:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Tue Mar 4 19:43:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 11:43:48 -0800 Subject: [Patches] [ python-Patches-683592 ] unicode support for os.listdir() Message-ID: Patches item #683592, was opened at 2003-02-09 22:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Just van Rossum (jvr) Assigned to: Martin v. Löwis (loewis) Summary: unicode support for os.listdir() Initial Comment: The attached patch makes os.listdir() return unicode strings, on plaforms that have Py_FileSystemDefaultEncoding defined as non-NULL. I'm by no means sure this is the right thing to do; it does seem right on OSX where Py_FileSystemDefaultEncoding is (or rather: will be real soon, I'm waiting for Jack's approval) utf-8. I'd be happy to add the code in an OSX-specific switch. A more subtle variant could perhaps only return unicode strings if the file name is not ASCII. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-04 20:43 Message: Logged In: YES user_id=92689 I've committed the "fallback-to-byte-strings" behavior. It's in posixmodule.c rev. 2.290. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 17:26 Message: Logged In: YES user_id=6380 On the one hand a user who isn't interested in encodings shouldn't be passing a Unicode argument. On the other hand, Unicode strings have a way of sneaking into your application when you least suspect them. E.g. Tkinter returns them, so does IDLE, and I see them used more and more in Zope 3. FWIW, I like Just's "fall back to bytestrings" aproach. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 17:00 Message: Logged In: YES user_id=21627 I only partially agree that this is somebody else's problem: On Unix, it is always considered application responsibility to interpret file names as characters if they need to - hence the lack of a system-provided encoding strategy. So it is the problem of Python or the Python application, and I think we should try to shield the application from these issues as good as we can. Therefore, I'm in favour of jvr's latest proposal (use byte strings as the last resort), hoping that the error case will be unfrequent. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:50 Message: Logged In: YES user_id=92689 Here's a note about file system encodings on OSX, including a few words about NFS: http://developer.apple.com/qa/qa2001/qa1173.html. I propose to fall back to a byte string if conversion to unicode fails. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-04 16:44 Message: Logged In: YES user_id=45365 I just did a test (created 254 files with all bytes except / and null in their names on a linux server, mounted the partition over NFS on MacOSX) and indeed MacOSX tries to interpret the bytes as UTF-8 and fails. I know that conversion works for HFS and HFS+ volumes (which carry a filename encoding with them, or you have to specify it when mounting). I assume it works for AFP and SMB (which also carries encoding info, IIRC) but I can't test this. I haven't a clue about webdav and such. Something to keep in mind is that we are really trying to solve someone else's problem: the inability of NFS and most unixen to handle file system encodings. If I'm on a latin-1 machine and I nfs-mount your latin-2 partition I will see garbage filenames. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:15 Message: Logged In: YES user_id=21627 Setting the file system encoding on startup should be fine, except that we need another setlocale/query/restore locale sequence. This is, in principle, bad, as there is no guarantee that the restore locale operation really produces the original state, and may cause problems if other threads are already running. In practice, it appears to work out just fine, as we use such sequences already (e.g. to undo the readline initialization). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 16:11 Message: Logged In: YES user_id=21627 I disagree with the last assertion: In *particular* if the file system encoding is UTF-8, there is a good chance that decoding will fail (unlike if it is latin-1; decoding will then never fail - it may just produce mojibake). OS X seems to make a guarantee to always return UTF-8 from its low-level API, but I distrust this guarantee until I see it with my own eyes :-) E.g. what happens if you mount an NFS tree, and the NFS server gives file names in some other encoding? I see the following options: - only enable the code for OS X. I dislike this option, as it essentially freezes the Unix status to non-Unicode (we won't get further insights, the de jure status won't change, de facto, all files will be encoded in the locale's encoding). - leave the code as-is, documenting the possibility of exceptions. - add byte strings instead of Unicode strings into the result for non-decodable strings. This gives a mixed-type result, which is fine if you only pass the resulting file names to stat() or open(), and will likely break the application if it tries to display the file names somehow. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 16:07 Message: Logged In: YES user_id=92689 I think it would be better to simply return byte strings if the file system encoding isn't know. (This btw. was what my original patch did.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 16:03 Message: Logged In: YES user_id=6380 Maybe the filesystem default encoding should be set to Latin-1 by default (when nothing better is known about it)? Then it's hard to imagine how the conversion could fail, since every Latin-1 byte maps 1-1 to the corresponding Unicode code point. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:54 Message: Logged In: YES user_id=6380 The setlocale call indeed works. I think I'd be happier if this was set by default, but I don't know what other consequences there would be. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:51 Message: Logged In: YES user_id=92689 It would seem that even with a user's locale there's a chance os.listdir() fails when passed a unicode argument. I'm not sure it's reasonable for os.listdir() to fail at all (if the directory to be listed exists and we the right permissions). If it's all too difficult to get right, I'm happy to put the listdir unicode support in a MacOSX switch. I know nothing about locales so I'm really not in a position to straighten this out. All I know is that if Py_FileSystemDefaultEncoding is known to be utf-8, it's just dumb _not_ to return unicode. You guys figure out the rest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 15:40 Message: Logged In: YES user_id=21627 Guido's scenario was precisely the reason why Unix was left out from consideration for PEP 277. However, it is better than it sounds: There is a good chance that invoking locale.setlocale(locale.LC_CTYPE, "") prior to invoking listdir will overcome the problem, as the setlocale call will set the file system encoding to the user's preference. If \xff is a valid file name in the user's preferred encoding, then listdir will succeed in converting this file name to a Unicode string. It might be useful to set the file system encoding on Unix to the user's preferred encoding unconditionally (i.e. not as a side effect of invoking setlocale). It might also be useful to expose the file system encoding read-only for inspection. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-04 15:31 Message: Logged In: YES user_id=92689 Would you prefer the error be silenced and a byte string be used instead? If so, should there be a warning? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:01 Message: Logged In: YES user_id=6380 I haven't seen the code, but I have a complaint. On Linux, when I have a file named '\xff' (i.e. its name is the single byte with value 255), os.listdir(u'.') gives me a UnicodeDecodeError. Is that really progress? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 07:49 Message: Logged In: YES user_id=21627 The current code looks fine to me. Closing this patch. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:56 Message: Logged In: YES user_id=92689 Martin, assigning this item to you. Please close it if you deem the changes in CVS correct. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 18:45 Message: Logged In: YES user_id=92689 Applied to CVS as: Modules/posixmodule.c: 2.288 Doc/lib/libos.tex: 1.115 Misc/NEWS: 1.687 Unicode errors are propagated as in the original version of the patch, libos.tex mentions Win NT/2k/XP and Unix. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 17:39 Message: Logged In: YES user_id=21627 Clearing the error is bad, I agree. I see two options: reraise the exception, deleting the result obtained so far (i.e. as the code did that the latest patch removes), OR add a byte string instead of the Unicode string into the result. Even though I have proposed the latter in the past, I could also accept the former; applications that anticipate that exception then just need to re-invoke listdir with a byte string, and deal with the result themselves. With these changes, the patch is fine with me. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 17:08 Message: Logged In: YES user_id=92689 I think this could be achieved by removing the "Py_FileSystemDefaultEncoding != NULL" part of the condition on line 1805, as indeed passing NULL as the encoding to PyUnicode_FromEncodedObject causes the default encoding to be used. Shall I check it in like that? I'm not quite happy with the fact that exceptions are silently dropped: should a warning be issued instead? Especially when using the default encoding, exceptions are not unlikely I suppose. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 16:48 Message: Logged In: YES user_id=21627 I see. The right thing, IMO, is to always return Unicode objects for Unicode arguments, just the same way the "et" parser works: if the file system encoding is NULL, fall back to the system default encoding. Then, you can generalize the docs to [NT and Unix] (with OS X being a flavour of Unix), or drop the OS reference completely (in which case the other os modules are effectively buggy). There might be a function already to fall back to the system default encoding; perhaps just passing NULL works. There should be a documentation section on Unicode file names; I volunteer to write it (Summary: NT+ uses Unicode natively, W9x uses "mbcs", OS X uses UTF-8, which equates to "Unicode natively", Unices with nl_langinfo(CODEPAGE) use that, all others use the system default encoding). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 15:32 Message: Logged In: YES user_id=92689 Ok, done, including a minor patch to Doc/lib/libos.tex. I also adapted the Misc/NEWS items. I'm not sure how to change the os.listdir() doco to better reflect the actual situation without mentioning Py_FileSystemDefaultEncoding... ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 14:11 Message: Logged In: YES user_id=21627 Looks good, but incomplete: If the argument is Unicode, *all* results should be Unicode. There should also be documentation changes. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 14:02 Message: Logged In: YES user_id=92689 I've attached a patch that fixes the bug as well as addresses the unicode arg vs. return value inconsistency that Martin noted. The exception behavior has not yet been changed. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-03 13:22 Message: Logged In: YES user_id=92689 Jack, as noted on #bug 696261, the bug is that os.listdir() doesn't do the right thing with a Unicode string argument (it should use Py_FileSystemDefaultEncoding but it doesn't; I'm working on it. Martin: I now see that PEP 277 says "Under this proposal, [os.listdir] will return a list of Unicode strings when its path argument is Unicode". I don't like this much (I really think we should push Unicode a little harder onto the users), but I'll look into changing the unix end of os.listdir() to do the same. I'll also review your exception comment. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 12:36 Message: Logged In: YES user_id=21627 I dislike this change, as it introduces inconsistency across platforms. On Win32, as a result of PEP 277, Unicode file names are only returned for Unicode directory names. There was an explicit discussion about this aspect of PEP 277, and this interface was accepted as The Right Thing. So I think Unix should follow here: return byte string file names for byte string directory names, and Unicode file names for Unicode directory names. Support for Unicode directory names should also invoke the file system encoding for the directory name. I'm also unsure about the exception handling. If there is a file name that doesn't decode according to the file system encoding, it raises the Unicode error. This means that all other file names are lost. This might be acceptable if the Unicode-in-Unicode-out strategy is used; in its current form, the change can and will break existing applications (which find all kinds of funny byte sequences on disk that don't work with the user's file system encoding). ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-03 12:23 Message: Logged In: YES user_id=45365 I think this patch does more bad than good. A practical problem is that os.path.walk doesn't work anymore if there are non-ascii directories in the directory tree (os.listdir will return these as unicode names, but doesn't accept unicode on input). See bug #696261. An additional problem is that various other methods in posix don't do the unicode conversion, so for instance os.getcwd() will return 8-bit strings in Py_FileSystemDefaultEncoding which are incompatible with the unicode returned by listdir. My preferred solution would be to do the unicode trick everywhere. Second best would be to retract the whole thing and think about it a bit more for Python 2.4. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 22:52 Message: Logged In: YES user_id=92689 Checked in as rev. 2.287 of Modules/posixmodule.c. Leaving this item open for now, in case MvL has comments when he gets back. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-25 18:22 Message: Logged In: YES user_id=6380 OK, check it in, just be prepared for contingencies. I really cannot judge whether this is right on all platforms. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-25 16:55 Message: Logged In: YES user_id=92689 Having missed 2.3a2, I'd like to get this in way ahead of 2.3b1. Any objections? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 19:17 Message: Logged In: YES user_id=92689 I'm pretty sure os.path deals just fine with unicode strings (it's all pure string manipulations, isn't it?) Worries: well, apparently on Windows os.listdir() has been returning unicode for some time, so it's not like we're breaking completely new grounds here. If anything breaks it's probably good this happens, as it gives an opportunity to fix things... I just found several example of potential breakage: _bsddb.c parses a filename arg with the "z" format specifier. gdbmmodule.c uses "s". bsddbmodule.c and dbmmodule.c as well. I'm not sure the above modules work on Windows with non-ascii filenames at all, but it doesn't look like it. Besides Windows (for which my patch is not relevant), only OSX sets Py_FileSystemDefaultEncoding, so any new breakage won't reach a mass market right away . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 18:46 Message: Logged In: YES user_id=38388 Ok, let's look at it from a different angle: things that you get from os.listdir() should be compatible to (at least) all the os.path tools and os itself. Converting to Unicode has the advantage that slicing and indexing into the path names will not break the paths (unlike UTF-8 encoded 8-bit strings which tend to break when you slice them). That said, I think you're right about the ASCII approach provided that the os, os.path tools can actually properly cope with Unicode. What I worry about is that if os.listdir() gives back Unicode for e.g. Latin-1 filenames and the application then passes the Unicode names to a C API using "s", prefectly working code will break... then again the C code should really use "es" for decoding to the Py_FileSystemDefaultEncoding as is done in e.g. fileobject.c. I really don't know what to do here... ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 17:24 Message: Logged In: YES user_id=92689 Here's an argument for ASCII and against the default encoding: if the default encoding is different from Py_FileSystemDefaultEncoding, things go wrong: an 8-bit string passed to file() will be interpreted as Py_FileSystemDefaultEncoding (more precisely: will not be interpreted at all), not the default encoding... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 12:24 Message: Logged In: YES user_id=38388 Right, except that injecting Unicode into Unicode-unaware code can be dangerous (e.g. some code might require a string object to work on). E.g. if someone sets the default encoding to Latin-1 he wouldn't expect os.listdir() to suddenly return Unicode for him. This may be a problem in general for the change to os.listdir(). We'll just have to see what happens during the alpha and beta phases. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 12:08 Message: Logged In: YES user_id=92689 On the other hand, if it's not ASCII, wouldn't a unicode string be more appropriate to begin with? If it's encodable with the default encoding, this will happen as soon as the string is used in a piece of unicode-unaware code, right? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:55 Message: Logged In: YES user_id=38388 Good question. The default encoding would better fit into the concept, I guess. Instead of PyUnicode_AsASCIIString(v) you'd have to use PyUnicode_AsEncodedString(v, NULL, "strict"). ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 11:49 Message: Logged In: YES user_id=92689 Ok, I went for your original suggestion: always convert to unicode and then try to convert to ascii. See new patch. Or should this use the default encoding? Hm. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 11:17 Message: Logged In: YES user_id=38388 The file system does not need to support embedded \0 chars even if it supports UTF-16. It only happens that your test assumes that you have one byte per characters encodings which may not always be true. With UTF-16 your test will see lots of \0 bytes but not necessarily ones which are ord(x)>=128. I'm not sure whether other variable length encodings can result in \0 bytes, e.g. the Asian ones. There's also the possibility of the encoding mapping the ASCII range to other non-ASCII characters, e.g. ShiftJIS does this for the Yen sign. If you absolutely want to use the simple test, I'd at least restrict the test to an ASCII isalnum(x) test and then try the encode/decode method I described if this test fails. Note that isalnum() can be locale dependent on some platforms, so you have to hard-code it. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:51 Message: Logged In: YES user_id=92689 I don't see hot UTF-16 could be a valid value for Py_FileSystemDefaultEncoding, as for most platforms the file name can't contain null bytes. My looking at the NAMELEN() spaghetti, it seems platforms without HAVE_DIRENT_H might still support embedded null bytes. Any wisdom on this? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 10:24 Message: Logged In: YES user_id=38388 Your test will probably catch most cases, but it could fail for e.g. UTF-16. The only true test would be to first convert to Unicode and then try to convert back to ASCII. If you get an error you can be sure that the text is not ASCII compatible. Given that .listdir() involves lots of IO I think the added performance hit wouldn't be noticable. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-10 10:12 Message: Logged In: YES user_id=92689 Applied both suggestions. However, I'm not sure if my ASCII test does the right thing, or at least I don't think it does if Py_FileSystemDefaultEncoding is not a superset of ASCII. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-10 04:07 Message: Logged In: YES user_id=33168 The code which uses unicode APIs should probably be wrapped with: #ifdef Py_USING_UNICODE /* code */ #endif ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-10 02:16 Message: Logged In: YES user_id=6380 At the very least, I'd like it to return Unicode only when the original string isn't just ASCII. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=683592&group_id=5470 From noreply@sourceforge.net Tue Mar 4 23:24:20 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 04 Mar 2003 15:24:20 -0800 Subject: [Patches] [ python-Patches-697613 ] fix bug #670311: sys.exit and PYTHONINSPECT Message-ID: Patches item #697613, was opened at 2003-03-04 23:24 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697613&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug #670311: sys.exit and PYTHONINSPECT Initial Comment: So we want to stop SystemExit from causing an actual exit() when using python -i. This patch introduces two new API calls, PyRun_BlockSysExit and PyRun_UnblockSysExit, to set a flag toavoid the exit() call in PyErr_PrintEx. There are several other ways to fix this bug, but I think all of the others I came up with would cause more backwards compatibilty problems and/or be a lot more work. Some possibilities, if anyone is interested, would be: 1) Add a new PyCompilerFlags flag This seems a bit ugly, as it's not really a "compile flag". 2) Add some special run routines that block the exit. 3) Add another parameter to existing run routines. 4) Change PyErr_PrintEx so it doesn't exit when printing a SystemError, instead having the run routines responsible for exiting when catching that exception. What do you think? Is my patch good enough, or would you like something else? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697613&group_id=5470 From noreply@sourceforge.net Wed Mar 5 11:43:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 03:43:18 -0800 Subject: [Patches] [ python-Patches-697939 ] optparse unit tests + fixes Message-ID: Patches item #697939, was opened at 2003-03-05 12:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697939&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Johannes Gijsbers (jlgijsbers) Assigned to: Nobody/Anonymous (nobody) Summary: optparse unit tests + fixes Initial Comment: Here's a patch that mostly converts the tests from optik 1.4 to the unittest format and makes it usable in the Python library. I've also added some tests, of which five fail with current CVS: test_opt_string_empty test_opt_string_too_short test_opt_string_long_invalid test_opt_string_short_invalid test_help_long_opts_first I changed the following to fix the tests: * format_option_strings_short_first and format_option_strings_long_first have been merged into one function, format_options, to eliminate the almost complete duplication. To make this possible, short_first is now an attribute, which conveniently also eases changing short_first after instantiation. * _short_opts and _long_opts are set in the Option constructor, instead of in _check_option_strings, to prevent an AttributeError which would occur when no option strings were passed, making the "at least one option string must be supplied" OptionError useless. * Removed the check that would raise a RuntimeError in Option.__str__ when no option strings existed in _short_opts or _long_opts. A RuntimeError would be raised when an OptionError was raised in _set_opt_strings, because, quite logically, no option strings were set at that point. I'm not sure why the check was there, because _short_opts and _long_opts are only empty when instantation fails, or when somebody set those *internal* attributes to false. And the moment you start mucking with internal attributes, you're on your own. :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697939&group_id=5470 From noreply@sourceforge.net Wed Mar 5 11:48:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 03:48:50 -0800 Subject: [Patches] [ python-Patches-697941 ] optparse OptionGroup docs Message-ID: Patches item #697941, was opened at 2003-03-05 12:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697941&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Johannes Gijsbers (jlgijsbers) Assigned to: Nobody/Anonymous (nobody) Summary: optparse OptionGroup docs Initial Comment: A small patch to add a bit about the new OptionGroup, added in Optik 1.4 and Python CVS but currently undocumented. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697941&group_id=5470 From noreply@sourceforge.net Wed Mar 5 14:26:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 06:26:50 -0800 Subject: [Patches] [ python-Patches-696645 ] VMS patches, cleaning part Message-ID: Patches item #696645, was opened at 2003-03-03 16:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696645&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Martin v. Löwis (loewis) Summary: VMS patches, cleaning part Initial Comment: This is the cleaning patches. I will provide other patches in a separate item. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-05 15:26 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as getbuildinfo.c 2.10 main.c 1.73 posixmodule.c 2.291 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696645&group_id=5470 From noreply@sourceforge.net Wed Mar 5 17:00:33 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 09:00:33 -0800 Subject: [Patches] [ python-Patches-698082 ] Modulefinder and excludes Message-ID: Patches item #698082, was opened at 2003-03-05 18:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698082&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Just van Rossum (jvr) Summary: Modulefinder and excludes Initial Comment: Modulefinder doesn't exclude modules in packages correctly. Attached patch fixes this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698082&group_id=5470 From noreply@sourceforge.net Wed Mar 5 17:01:35 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 09:01:35 -0800 Subject: [Patches] [ python-Patches-698082 ] Modulefinder and excludes Message-ID: Patches item #698082, was opened at 2003-03-05 18:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698082&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Just van Rossum (jvr) Summary: Modulefinder and excludes Initial Comment: Modulefinder doesn't exclude modules in packages correctly. Attached patch fixes this. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2003-03-05 18:01 Message: Logged In: YES user_id=11105 IMO the patch speaks for itself. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698082&group_id=5470 From noreply@sourceforge.net Wed Mar 5 17:35:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 09:35:09 -0800 Subject: [Patches] [ python-Patches-698082 ] Modulefinder and excludes Message-ID: Patches item #698082, was opened at 2003-03-05 18:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698082&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Just van Rossum (jvr) Summary: Modulefinder and excludes Initial Comment: Modulefinder doesn't exclude modules in packages correctly. Attached patch fixes this. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-05 18:35 Message: Logged In: YES user_id=92689 Looks good, applied. It's in rev. 1.6 of Lib/modulefinder.py ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-05 18:01 Message: Logged In: YES user_id=11105 IMO the patch speaks for itself. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698082&group_id=5470 From noreply@sourceforge.net Wed Mar 5 22:16:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 14:16:31 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 19:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-05 22:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Thu Mar 6 05:36:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 21:36:31 -0800 Subject: [Patches] [ python-Patches-698505 ] docs tor hotshot module Message-ID: Patches item #698505, was opened at 2003-03-06 16:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: docs tor hotshot module Initial Comment: The attached provides documentation for the hotshot module. Assigning to Fred for review. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 From noreply@sourceforge.net Thu Mar 6 05:39:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 21:39:54 -0800 Subject: [Patches] [ python-Patches-698505 ] docs tor hotshot module Message-ID: Patches item #698505, was opened at 2003-03-06 16:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: docs tor hotshot module Initial Comment: The attached provides documentation for the hotshot module. Assigning to Fred for review. ---------------------------------------------------------------------- >Comment By: Anthony Baxter (anthonybaxter) Date: 2003-03-06 16:39 Message: Logged In: YES user_id=29957 stupid sourceforge tracker. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 From noreply@sourceforge.net Thu Mar 6 05:40:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 21:40:28 -0800 Subject: [Patches] [ python-Patches-698505 ] docs for hotshot module Message-ID: Patches item #698505, was opened at 2003-03-06 16:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Fred L. Drake, Jr. (fdrake) >Summary: docs for hotshot module Initial Comment: The attached provides documentation for the hotshot module. Assigning to Fred for review. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2003-03-06 16:39 Message: Logged In: YES user_id=29957 stupid sourceforge tracker. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 From noreply@sourceforge.net Thu Mar 6 06:37:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 05 Mar 2003 22:37:48 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-05 22:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From webmaster@pferdemarkt.ws Thu Mar 6 14:11:46 2003 From: webmaster@pferdemarkt.ws (webmaster@pferdemarkt.ws) Date: Thu, 6 Mar 2003 06:11:46 -0800 Subject: [Patches] Pferdemarkt.ws informiert! Newsletter 03/2003 http://www.pferdemarkt.ws Message-ID: <200303061411.GAA25766@eagle.he.net> http://www.pferdemarkt.ws Wir sind in 2003 erfolgreich in des neue \"Pferdejahr 2003 gestartet. Für den schnellen Erfolg unseres Marktes möchten wir uns bei Ihnen bedanken. Heute am 06.03.2003 sind wir gut 2 Monate Online! Täglich wächst unsere Datenbank um 30 Neue Angebote. Stellen auch Sie als Privatperson Ihre zu verkaufenden Pferde direkt und vollkommen kostenlos ins Internet. Zur besseren Sichtbarmachung Ihrer Angebote können Sie bis zu ein Bild zu Ihrer Pferdeanzeige kostenlos einstellen! Wollen Sie direkt auf die erste Seite, dann können wir Ihnen unser Bonussystem empfehlen. klicken Sie hier: http://www.pferdemarkt.ws/bestellung.html Ihr http://Pferdemarkt.ws Team Klicken Sie hier um sich direkt einzuloggen http://www.Pferdemarkt.ws Kostenlos Anbieten, Kostenlos Suchen! Direkt von Privat zu Privat! Haben Sie noch Fragen mailto: webmaster@pferdemarkt.ws From noreply@sourceforge.net Thu Mar 6 16:57:30 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 06 Mar 2003 08:57:30 -0800 Subject: [Patches] [ python-Patches-698833 ] ZipFile - support for file decryption Message-ID: Patches item #698833, was opened at 2003-03-06 17:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698833&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Giovanni Bajo (giovannibajo) Assigned to: Nobody/Anonymous (nobody) Summary: ZipFile - support for file decryption Initial Comment: The attached patch adds support for the ZIP file decryption. Right now, only decryption is supported (not encryption), but I will work on this as well if there are no problems with this patch. The ZIP encryption scheme uses 96-bits keys, so there might be some US law annoyances (see http://www.info- zip.org/pub/infozip/FAQ.html#crypto). To me, everything seems legit, but I am not a lawyer. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698833&group_id=5470 From noreply@sourceforge.net Fri Mar 7 00:08:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 06 Mar 2003 16:08:50 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-06 19:08 Message: Logged In: YES user_id=80475 Misc/NEWS 1.69 Objects/dictobject.c 2.141 Doc/lib/libstdtypes.tex 1.120 Lib/UserDict.py 1.24 Lib/test/test_types.py 1.47 Lib/test/test_userdict.py 1.13 ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-04 13:55 Message: Logged In: YES user_id=670441 argh... I put the NEWS item in the wrong place. Ignore patchpop2(I can't delete it), look at patchpop3. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-04 12:19 Message: Logged In: YES user_id=670441 Okay, here's patchpop2 with the diff'ed dictobject, UserDict, test_types, test_userdict, NEWS, and Doc/lib/libstdtypes. whew. Let me know if you need any changes. The change to DictMixin seems a bit clumsy, but I liked it better than other things I came up with. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 10:19 Message: Logged In: YES user_id=6380 You don't need to update whatsnew23.tex; its editor prefers to do this himself. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-03 23:26 Message: Logged In: YES user_id=80475 For NEWS, add a new entry (so that it documents a difference from Py2.3a2). For whatsnew23, modify the existing entry (since it is a delta from Py2.3). ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-03 14:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-01 21:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-28 21:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-28 20:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Fri Mar 7 04:31:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 06 Mar 2003 20:31:31 -0800 Subject: [Patches] [ python-Patches-693753 ] fix for bug 639806: default for dict.pop Message-ID: Patches item #693753, was opened at 2003-02-26 16:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Accepted Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix for bug 639806: default for dict.pop Initial Comment: This patch adds an optional default value to dict.pop, so that it parallels dict.get, see discussion in bug 639806. If no default is given, the old behavior still exists, so backwards compatibility is no problem. The new pop must use METH_VARARGS and PyArg_UnpackTuple, somewhat effecting efficiency. If this is considered desirable, I could also provide the same behavior for list.pop. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-07 04:31 Message: Logged In: YES user_id=670441 Thanks for fixing up my UserDict.DictMixin patch. Much nicer. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 00:08 Message: Logged In: YES user_id=80475 Misc/NEWS 1.69 Objects/dictobject.c 2.141 Doc/lib/libstdtypes.tex 1.120 Lib/UserDict.py 1.24 Lib/test/test_types.py 1.47 Lib/test/test_userdict.py 1.13 ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-04 18:55 Message: Logged In: YES user_id=670441 argh... I put the NEWS item in the wrong place. Ignore patchpop2(I can't delete it), look at patchpop3. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-04 17:19 Message: Logged In: YES user_id=670441 Okay, here's patchpop2 with the diff'ed dictobject, UserDict, test_types, test_userdict, NEWS, and Doc/lib/libstdtypes. whew. Let me know if you need any changes. The change to DictMixin seems a bit clumsy, but I liked it better than other things I came up with. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-04 15:19 Message: Logged In: YES user_id=6380 You don't need to update whatsnew23.tex; its editor prefers to do this himself. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-04 04:26 Message: Logged In: YES user_id=80475 For NEWS, add a new entry (so that it documents a difference from Py2.3a2). For whatsnew23, modify the existing entry (since it is a delta from Py2.3). ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-03 19:59 Message: Logged In: YES user_id=670441 Should I make a new NEWS item, or should I modify the existing NEWS item about dict.pop? And should I make a new whatsnew23 item or modify the existing one? I'm guessing a new NEWS item and a modified whatsnew item, but I'll post a patch when you tell me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-02 02:40 Message: Logged In: YES user_id=31435 dicts have a .pop() method? Heh. I must have slept through that one . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-01 02:59 Message: Logged In: YES user_id=6380 Alex Martelli's argument convinced me, I'm +0.5 on the feature. The 0.5 is because it's definitely feature bloat. Given how few use cases there are for dict.pop() in the first place, I'm not worried about the minor slowdown due to extra argument parsing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-01 01:30 Message: Logged In: YES user_id=80475 The patch looks fine. Assigning to Guido for pronouncement. Guido, the patch adds optional get() like functionality for dict.pop(). The nearest parallel is the default argument for getattr(obj, attr, [default]). On the plus side, it makes pop easier to use and more flexible. On the minus side, it adds more complexity to the mapping interface and it slows down the normal case for d.pop(k). If it is accepted the poster should add test cases, a NEWS item, doc updates, and parallel changes to UserDict.UserDict and UserDict.DictMixin. Then, re-assign to me and I'll check it all and apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=693753&group_id=5470 From noreply@sourceforge.net Fri Mar 7 05:52:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 06 Mar 2003 21:52:59 -0800 Subject: [Patches] [ python-Patches-667730 ] More DictMixin Message-ID: Patches item #667730, was opened at 2003-01-14 08:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Raymond Hettinger (rhettinger) Summary: More DictMixin Initial Comment: This patch is intended to provide a more consistent implementation for the various dictionary like objects of the standard library. test_userdict has been rewritten, it now use unittest and define a test-case wich allow to check for conformity with the dictionary protocol. test_shelve and test_weakref have been rewritten to use the test_userdict test-case. test_os has been extended: a new test case check for environ object conformity to the dictionary protocol. The patch modify the UserDict module: * The doc says that __contains__ should be one of the methods to redefine for better efficiency but the implementation make __contains__ dependent of has_key definition. The patch reverse methods dependencies. * Change iterkey = __iter__ to def iterkey(self): return self.__iter__() to make iterkey able to use overiden __iter__ methods. * I have also a added __init__, copy and __repr__ methods to DictMixin. * The UserDict.UserDict class is a subclass of DictMixin, this allow to simplify UserDict implementation. The patch is rather conservative since a lot of methods definition could still be removed from UserDict. In the weakref module, the patch make WeakValueDictionnary and WeakKeyDictionnary subclasses of UserDict.DictMixin. It also use nested scopes, the new generators syntax for iterator methods and rewrite WeakKeyDictionnary.__delitem__ . All of this allow to decrease the module size by 50%. In the shelve module, the patch add a copy() method which return a dictionary with the keys and values of the database. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 00:52 Message: Logged In: YES user_id=80475 The patch looks good. Please make two adjustments and re-submit. 1) Change the test_func docstrings to comment blocks. If a docstring is present, test support will print them in the summary instead of the test name. 2) Change the logic for mapping.pop() to accommodate the new default argument option which was added yesterday. The format is m.pop(key[, default]). ---------------------------------------------------------------------- Comment By: Sebastien Keim (s_keim) Date: 2003-03-03 10:27 Message: Logged In: YES user_id=498191 I have downloaded a new version of the patch updated to Python2.3a2 I hope to have removed all the stuff which could break backward compatibility since the new proposed patch contain now only the testing stuff (well, almost since I have also added a pop method to the weak dictionary classes to make them compatible with the test case). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-15 21:50 Message: Logged In: YES user_id=80475 Also, +1 on consolidating the test cases though it should be done after any other changes to the files so we can make sure that nothing got broken. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-15 21:35 Message: Logged In: YES user_id=80475 * UserDict.UserDict should not change. As Martin pointed- out, inheriting from object changes the semantics in a non- backward compatible way. Also, the class is efficiently implemented in terms an internal dictionary and would be slowed down by the nest of calls in Mixin. Also, I think the code in incorrect in defining __iter__, there was a reason it was pulled out into a separate subclass -- that was done in Py2.2. and is not an easily reversible decision. * -0 on the changes to has_key() and __contains__(). has_key() was put at a lower level than __contains__ because the older dict-style interfaces all define has_key. * +1 for changing iterkeys() to a full definition (and +1 for doing the same for __iter__()). Sabastien is correct is pointing out the advantages for propagating an overridden method. * -1 for altering repr() implementation. The current approach is shorter, cleaner, and faster. * -1 for adding __nonzero__(). Even dictionaries don't implement this method; they let len() do the talking. * -1 for adding __init__() and copy(). Both need to make assumptions about the order and number of parameters in the constructor of the class using the mixin. I think they are rarely helpful and are sometime harmful in introducing surprising, hard-to-find errors. People who need an init() or copy() can code them more cleanly and directly in the extending class. Also, I don't think the code is correct since DictMixin will be a base class, the use of super() is not what is wanted here -- *if* you were going to do this, try something like self.__class__(). Further, adding these methods violates my original intent for this class which was to extrapolate four basic mapping methods into a full mapping interface. It was not intended as a stand-alone class. Also, copy() cannot guarantee that it is copying all the relevant data for the sub-class and that violates the definition of what copy() is supposed to do. If something like this were attempted, it should be its own mixin (automatically adding copy support to any class) and it should be rather sophisticated about how to perfectly replicate itself (not easily done if the underlying data is in a file, database, or in a distributed app). * +0 on changing weakdicts provided it is done minimally and carefully with attention to leaving semantics unchanged and not slowing performance. The advantage goes beyond consistency, it removes code duplication, keeps well thought-out logic in one place, and provides an automatic interface update from DictMixin if the dictionary interface ever sprouts another method. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-14 16:43 Message: Logged In: YES user_id=21627 This patch breaks backwards compatibility. UserDict is an oldstyle class on purpose, since changing it to a newstyle class will certainly break the compatibility in subtle ways (e.g. by changing what type(userdictinstance) is). Unless you can bring forward a better rationale than consistency, this patch will be rejected. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 From noreply@sourceforge.net Fri Mar 7 06:22:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 06 Mar 2003 22:22:19 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-06 01:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 01:22 Message: Logged In: YES user_id=80475 Looks good. Tests out okay. Use double quotes throughout. Consider adding a news item, docs, and a test. Assign back to me when you think it's ready to go. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From noreply@sourceforge.net Fri Mar 7 06:43:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 06 Mar 2003 22:43:50 -0800 Subject: [Patches] [ python-Patches-698505 ] docs for hotshot module Message-ID: Patches item #698505, was opened at 2003-03-06 00:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: docs for hotshot module Initial Comment: The attached provides documentation for the hotshot module. Assigning to Fred for review. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 01:43 Message: Logged In: YES user_id=80475 The TeX markup checks out fine. Consider documenting lineevents and linetimings which are exposed upon: import hotshot. For the example, consider adding a comment line at the beginning with hints that the example produces large files and takes a long time to run. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2003-03-06 00:39 Message: Logged In: YES user_id=29957 stupid sourceforge tracker. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 From noreply@sourceforge.net Fri Mar 7 14:25:06 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 06:25:06 -0800 Subject: [Patches] [ python-Patches-675422 ] Add tzset method to time module Message-ID: Patches item #675422, was opened at 2003-01-27 08:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Guido van Rossum (gvanrossum) Summary: Add tzset method to time module Initial Comment: Adds access to the tzset method, allowing you to change your local timezone as required. In addition to invoking the tzset system call, the code also updates the timezone attributes (time.timezone etc). This lets you do timezone conversions amongst other things. Also includes changes to configure.in to only build new code if the tzset method correctly switches timezones on your platform. This should be for all modern Unixes, and possibly other platforms. Also includes tests in test_time.py Docs would be along the lines of: tzset() -- Initialize, or reinitialize, the local timezone to the value stored in os.environ['TZ']. The TZ environment variable should be specified in standard Uniz timezone format as documented in the tzset man page (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently fall back to UTC. If the TZ environment variable is not set, the local timezone is set to the systems best guess of wallclock time. Changing the TZ environment variable without calling tzset *may* change the local timezone used by methods such as localtime, but this behaviour should not be relied on. eg:: >>> now = time.time() >>> os.environ['TZ'] = 'Europe/Amsterdam' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 14:35:17 2003' >>> time.tzname ('CET', 'CEST') >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 08:35:17 2003' >>> time.tzname ('EST', 'EDT') ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 09:25 Message: Logged In: YES user_id=6380 zenzen: when I run the test suite on my Red Hat Linux 7.3 box, I get one failure: the test line self.failUnless(time.tzname[0] in ('UTC','GMT')) fails when the timezone is set to 'Luna/Tycho', because tzname is in fact set to ('Luna/Tych', 'Luna/Tych'). If I comment out that one line the tzset test suite passes. What should I do? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 16:49 Message: Logged In: YES user_id=6380 Sorry, not a chance. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-21 16:45 Message: Logged In: YES user_id=46639 It is a patch to 2.3, but I'd though I'd try and sneak this new feature past people into 2.2.3 as I want to be able to use it in Zope 2 :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 07:56 Message: Logged In: YES user_id=6380 Uh? This is a new feature, so doesn't apply to 2.2.3. Maybe you meant 2.3? ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-20 23:29 Message: Logged In: YES user_id=46639 Assigning to Guido for consideration of being added to 2.2.3, and since he through this patch was a good idea in the first place :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 From noreply@sourceforge.net Fri Mar 7 15:04:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 07:04:50 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 16:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Guido van Rossum (gvanrossum) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 10:04 Message: Logged In: YES user_id=6380 Everything looks fine, except subtracting 1 from the expression in the PyHeapType_GET_MEMBERS() macro. Thart makes the first members slot overlap with the 'name' and 'slots' struct members. I'll get rid of the "-1" part. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-03 09:57 Message: Logged In: YES user_id=6380 I'll look at this on Friday. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Fri Mar 7 15:24:23 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 07:24:23 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 16:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Guido van Rossum (gvanrossum) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 10:24 Message: Logged In: YES user_id=6380 Checked in, with that one fix. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 10:04 Message: Logged In: YES user_id=6380 Everything looks fine, except subtracting 1 from the expression in the PyHeapType_GET_MEMBERS() macro. Thart makes the first members slot overlap with the 'name' and 'slots' struct members. I'll get rid of the "-1" part. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-03 09:57 Message: Logged In: YES user_id=6380 I'll look at this on Friday. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Fri Mar 7 15:51:35 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 07:51:35 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 22:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Accepted Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Guido van Rossum (gvanrossum) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- >Comment By: Christian Tismer (tismer) Date: 2003-03-07 16:51 Message: Logged In: YES user_id=105700 Oops! You are right. I forgot to back-port that change into the future. My 2.2.2 version already reads like this: /* access macro to the members which are floating "behind" the object */ #define PyHeapType_GET_MEMBERS(etype) \ ((PyMemberDef *)(((char *)etype) + (etype)->type.ob_type->tp_basicsize)) Thanks for taking care -- chris ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 16:24 Message: Logged In: YES user_id=6380 Checked in, with that one fix. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 16:04 Message: Logged In: YES user_id=6380 Everything looks fine, except subtracting 1 from the expression in the PyHeapType_GET_MEMBERS() macro. Thart makes the first members slot overlap with the 'name' and 'slots' struct members. I'll get rid of the "-1" part. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-03 15:57 Message: Logged In: YES user_id=6380 I'll look at this on Friday. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Fri Mar 7 17:37:27 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 09:37:27 -0800 Subject: [Patches] [ python-Patches-696193 ] Enable __slots__ for meta-types Message-ID: Patches item #696193, was opened at 2003-03-02 16:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Accepted Priority: 5 Submitted By: Christian Tismer (tismer) Assigned to: Guido van Rossum (gvanrossum) Summary: Enable __slots__ for meta-types Initial Comment: The new type system allows non-empty __slots__ only for fixed-size objects. Meta-types are types which instances are also types. types are variable-sized, because they take the slot definitions for their instances, so the cannot have extra members from their meta-type. The proposed solution allows for two things: a) meta-types can have slots b) extensions get access to the whole type object and can create extended types with private fields. The changes providing this are quite simple: - replace the internal hidden "etype" and turn it into an explicit PyHeapTypeObject in object.h - instead of a fixed offset into the former etype, the slots calculation is based upon tp_basicsize. To keep things easy, I added a macro which does this calculation, and member access read now like so: before: type->tp_members = et->members; after: type->tp_members = PyHeapType_GET_MEMBERS(et); This patch has been tested thoroughly in my own code since Python 2.2, and I think it is ripe to get into the distribution. It has almost no impact on speed or simlicity. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 12:37 Message: Logged In: YES user_id=6380 You're welcome. That's what I'm here for. :-) ---------------------------------------------------------------------- Comment By: Christian Tismer (tismer) Date: 2003-03-07 10:51 Message: Logged In: YES user_id=105700 Oops! You are right. I forgot to back-port that change into the future. My 2.2.2 version already reads like this: /* access macro to the members which are floating "behind" the object */ #define PyHeapType_GET_MEMBERS(etype) \ ((PyMemberDef *)(((char *)etype) + (etype)->type.ob_type->tp_basicsize)) Thanks for taking care -- chris ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 10:24 Message: Logged In: YES user_id=6380 Checked in, with that one fix. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 10:04 Message: Logged In: YES user_id=6380 Everything looks fine, except subtracting 1 from the expression in the PyHeapType_GET_MEMBERS() macro. Thart makes the first members slot overlap with the 'name' and 'slots' struct members. I'll get rid of the "-1" part. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-03 09:57 Message: Logged In: YES user_id=6380 I'll look at this on Friday. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696193&group_id=5470 From noreply@sourceforge.net Fri Mar 7 21:18:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 13:18:09 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-05 22:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2003-03-07 13:18 Message: Logged In: YES user_id=357491 The quotes thing was just a slip-up. t' fixed in my local copy and thus it will show up when I upload another patch. I will write up patches to the docs, although the docs guarantee certain methods that are actually conditionally added to the object; should I go ahead and just change the docs to reflect this or rip out the conditionality of the adding of the methods since the file object, if using a socket, is coming from socket.makefile() (I think; urllib seems to be from the 1.5 days and thus is using httplib.HTTP() and thus had to read the code)? I will also come up with a news item to be pasted into Misc/NEWS by the person who checks this in. As for the test, though, test_urllib only tests quote(). The module itself has some tests that can be run when the module is __main__, but all it does is fetch various pages and print the output; nothing really there that wouldn't be caught from people using it day-to-day. In other words there is no good place to put a test since there basically are no tests for this part of the module. =) Yes, I could fix this, but that would be a completely separate patch since the quote() tests are not even a PyUnit testing suite. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-06 22:22 Message: Logged In: YES user_id=80475 Looks good. Tests out okay. Use double quotes throughout. Consider adding a news item, docs, and a test. Assign back to me when you think it's ready to go. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From noreply@sourceforge.net Fri Mar 7 23:17:56 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 15:17:56 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-06 01:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 18:17 Message: Logged In: YES user_id=80475 That's fine. Go ahead and load the patch without the tests. Keep it on your todo list. It would be nice to have some good PyUnit tests for this module. Assign it to me when it's ready and I'll load it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-07 16:18 Message: Logged In: YES user_id=357491 The quotes thing was just a slip-up. t' fixed in my local copy and thus it will show up when I upload another patch. I will write up patches to the docs, although the docs guarantee certain methods that are actually conditionally added to the object; should I go ahead and just change the docs to reflect this or rip out the conditionality of the adding of the methods since the file object, if using a socket, is coming from socket.makefile() (I think; urllib seems to be from the 1.5 days and thus is using httplib.HTTP() and thus had to read the code)? I will also come up with a news item to be pasted into Misc/NEWS by the person who checks this in. As for the test, though, test_urllib only tests quote(). The module itself has some tests that can be run when the module is __main__, but all it does is fetch various pages and print the output; nothing really there that wouldn't be caught from people using it day-to-day. In other words there is no good place to put a test since there basically are no tests for this part of the module. =) Yes, I could fix this, but that would be a completely separate patch since the quote() tests are not even a PyUnit testing suite. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 01:22 Message: Logged In: YES user_id=80475 Looks good. Tests out okay. Use double quotes throughout. Consider adding a news item, docs, and a test. Assign back to me when you think it's ready to go. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From noreply@sourceforge.net Fri Mar 7 23:58:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 15:58:21 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-05 22:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2003-03-07 15:58 Message: Logged In: YES user_id=357491 OK, the new patch has the quote fix. I also added a single line to the urllib doc saying that it supports the iterator protocol. I don't know how the naming works for the \ref{} tex directive so I didn't put that in for referencing the iterator type although I suspect it wouldn't hurt. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 15:17 Message: Logged In: YES user_id=80475 That's fine. Go ahead and load the patch without the tests. Keep it on your todo list. It would be nice to have some good PyUnit tests for this module. Assign it to me when it's ready and I'll load it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-07 13:18 Message: Logged In: YES user_id=357491 The quotes thing was just a slip-up. t' fixed in my local copy and thus it will show up when I upload another patch. I will write up patches to the docs, although the docs guarantee certain methods that are actually conditionally added to the object; should I go ahead and just change the docs to reflect this or rip out the conditionality of the adding of the methods since the file object, if using a socket, is coming from socket.makefile() (I think; urllib seems to be from the 1.5 days and thus is using httplib.HTTP() and thus had to read the code)? I will also come up with a news item to be pasted into Misc/NEWS by the person who checks this in. As for the test, though, test_urllib only tests quote(). The module itself has some tests that can be run when the module is __main__, but all it does is fetch various pages and print the output; nothing really there that wouldn't be caught from people using it day-to-day. In other words there is no good place to put a test since there basically are no tests for this part of the module. =) Yes, I could fix this, but that would be a completely separate patch since the quote() tests are not even a PyUnit testing suite. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-06 22:22 Message: Logged In: YES user_id=80475 Looks good. Tests out okay. Use double quotes throughout. Consider adding a news item, docs, and a test. Assign back to me when you think it's ready to go. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From noreply@sourceforge.net Fri Mar 7 23:58:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 15:58:17 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-05 22:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) >Assigned to: Raymond Hettinger (rhettinger) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-07 15:58 Message: Logged In: YES user_id=357491 OK, the new patch has the quote fix. I also added a single line to the urllib doc saying that it supports the iterator protocol. I don't know how the naming works for the \ref{} tex directive so I didn't put that in for referencing the iterator type although I suspect it wouldn't hurt. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 15:17 Message: Logged In: YES user_id=80475 That's fine. Go ahead and load the patch without the tests. Keep it on your todo list. It would be nice to have some good PyUnit tests for this module. Assign it to me when it's ready and I'll load it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-07 13:18 Message: Logged In: YES user_id=357491 The quotes thing was just a slip-up. t' fixed in my local copy and thus it will show up when I upload another patch. I will write up patches to the docs, although the docs guarantee certain methods that are actually conditionally added to the object; should I go ahead and just change the docs to reflect this or rip out the conditionality of the adding of the methods since the file object, if using a socket, is coming from socket.makefile() (I think; urllib seems to be from the 1.5 days and thus is using httplib.HTTP() and thus had to read the code)? I will also come up with a news item to be pasted into Misc/NEWS by the person who checks this in. As for the test, though, test_urllib only tests quote(). The module itself has some tests that can be run when the module is __main__, but all it does is fetch various pages and print the output; nothing really there that wouldn't be caught from people using it day-to-day. In other words there is no good place to put a test since there basically are no tests for this part of the module. =) Yes, I could fix this, but that would be a completely separate patch since the quote() tests are not even a PyUnit testing suite. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-06 22:22 Message: Logged In: YES user_id=80475 Looks good. Tests out okay. Use double quotes throughout. Consider adding a news item, docs, and a test. Assign back to me when you think it's ready to go. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From noreply@sourceforge.net Sat Mar 8 04:42:20 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 07 Mar 2003 20:42:20 -0800 Subject: [Patches] [ python-Patches-675422 ] Add tzset method to time module Message-ID: Patches item #675422, was opened at 2003-01-28 00:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Guido van Rossum (gvanrossum) Summary: Add tzset method to time module Initial Comment: Adds access to the tzset method, allowing you to change your local timezone as required. In addition to invoking the tzset system call, the code also updates the timezone attributes (time.timezone etc). This lets you do timezone conversions amongst other things. Also includes changes to configure.in to only build new code if the tzset method correctly switches timezones on your platform. This should be for all modern Unixes, and possibly other platforms. Also includes tests in test_time.py Docs would be along the lines of: tzset() -- Initialize, or reinitialize, the local timezone to the value stored in os.environ['TZ']. The TZ environment variable should be specified in standard Uniz timezone format as documented in the tzset man page (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently fall back to UTC. If the TZ environment variable is not set, the local timezone is set to the systems best guess of wallclock time. Changing the TZ environment variable without calling tzset *may* change the local timezone used by methods such as localtime, but this behaviour should not be relied on. eg:: >>> now = time.time() >>> os.environ['TZ'] = 'Europe/Amsterdam' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 14:35:17 2003' >>> time.tzname ('CET', 'CEST') >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 08:35:17 2003' >>> time.tzname ('EST', 'EDT') ---------------------------------------------------------------------- >Comment By: Stuart Bishop (zenzen) Date: 2003-03-08 15:42 Message: Logged In: YES user_id=46639 Leave it commented out or remove that line. It is testing unimportant behaviour that looks more platform dependant than I suspected (and now I look at it again, what tzname should be set to if the timezone is unknow is unspecified by the tzset(3) docs). The important behaviour is that: a) the system silently falls back to UTC if the timezone is unknown, and this is tested elsewhere b) calling tzset resets tzname, which is also tested elsewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-08 01:25 Message: Logged In: YES user_id=6380 zenzen: when I run the test suite on my Red Hat Linux 7.3 box, I get one failure: the test line self.failUnless(time.tzname[0] in ('UTC','GMT')) fails when the timezone is set to 'Luna/Tycho', because tzname is in fact set to ('Luna/Tych', 'Luna/Tych'). If I comment out that one line the tzset test suite passes. What should I do? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-22 08:49 Message: Logged In: YES user_id=6380 Sorry, not a chance. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-22 08:45 Message: Logged In: YES user_id=46639 It is a patch to 2.3, but I'd though I'd try and sneak this new feature past people into 2.2.3 as I want to be able to use it in Zope 2 :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 23:56 Message: Logged In: YES user_id=6380 Uh? This is a new feature, so doesn't apply to 2.2.3. Maybe you meant 2.3? ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-21 15:29 Message: Logged In: YES user_id=46639 Assigning to Guido for consideration of being added to 2.2.3, and since he through this patch was a good idea in the first place :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 From noreply@sourceforge.net Sat Mar 8 12:06:27 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 04:06:27 -0800 Subject: [Patches] [ python-Patches-658327 ] Add inet_pton and inet_ntop to socket Message-ID: Patches item #658327, was opened at 2002-12-24 22:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jp Calderone (kuran) >Assigned to: Neal Norwitz (nnorwitz) Summary: Add inet_pton and inet_ntop to socket Initial Comment: Patch is against current CVS and adds two socket module functions, inet_pton and inet_ntop. Both of these should be available on all platforms (because of other dependancies in the code) so I don't think portability is a problem. inet_ntop converts a packed IP address to a human-readable '.' or ':' separated string representation of the IP. inet_pton performs the reverse operation. (Potential) problems: inet_pton sets errno to ENOSPC, which may lead to a confusing error message. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-04 08:05 Message: Logged In: YES user_id=21627 My two suggestions aren't exclusive: If you have the native inet_pton, you can *always* support IPv6 addresses with that, regardless of whether --enable-ipv6 was passed to configure or not. If that is done, it will be a legitime test failure for inet_pton not to support IPv6 - after all, the primary reason to define this function was to support IPv6, so if the native function fails to do so, there is clearly a bug in the system. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-04 04:41 Message: Logged In: YES user_id=33168 I added the #ifdef, but that doesn't address the testing problem. If the platform has inet_pton, but doesn't have IPv6 ENABLED. The inet_pton will be exported, but there's no good way to tell if you can pass an IPv6 address. The only way to test if IPv6 is enabled would be to call inet_pton with AF_INET6, catch a socket.error and check if the exception message is "unknown address family". Since this is really a testing issue, perhaps that's best after all? Do you agree this should be done? * Remove has_ipv6 * Export inet_pton & inet_ntop only if defined for platform * Only try to test inet_pton/ntop if defined for platform * Modify the tests to pass a valid IPv6 test, catch socket.error, if the error message is "unknown address family", don't test ipv6 any further, if the error message is different, raise TestFailed, if no exception, test all IPv6 addresses ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-03 23:25 Message: Logged In: YES user_id=33168 As I recall, yes, has_ipv6 is only for tests. There was no way to distinguish if python was built with IPv6 support, since AF_INET6 was always defined. Your second approach sounds like it will work. I need to review the code, though. I've forgotten how it works. :-( ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-03 11:15 Message: Logged In: YES user_id=21627 The has_ipv6 test is only there for the tests? In that case, drop it, and just perform AF_INET6 conversions unconditionally. OTOH, I think we should not expose the emulated inet_pton: it doesn't set errno correctly, and offers no advantage over inet_addr. So wrap the entire code with HAVE_INET_PTON, and only perform the tests if the function is supported. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-05 03:40 Message: Logged In: YES user_id=33168 I was just about to check this in, but then I ran into a problem. IPv6 may not be enabled, even if the constant AF_INET6 exists. The cleanest way I saw to address this in the test was to add a has_ipv6 boolean constant to the socket module. Martin, do you think this is acceptable? Attached is a complete patch which should be safe (based on the discussion below), includes tests and doc changes. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2003-01-11 18:04 Message: Logged In: YES user_id=366566 Yea, testing for the proper input length is definitely something that should be done. The patch looks good, but for one thing. If the specified address family is neither AF_INET nor AF_INET6, the length won't be tested and the underlying inet_ntop will be called. This isn't a problem now (afaik) because only those two address families are support, but in a future libc version with more supported address families, it might open a similar hole to the one you've fixed. Perhaps the + } else { + PyErr_SetString(socket_error, "unknown address family"); + return NULL; + } should be moved up from the second if-grouping to follow the first if-grouping. Everything else looks good to me. Thanks for taking the time to look at this :) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-11 04:49 Message: Logged In: YES user_id=33168 JP, do you agree with my comment on 2002-12-30 about the checks? I have attached an updated patch. Please review and verify this is correct. Thank you for the additional tests. Feel free to submit patches with additional tests for any and all modules! ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-31 17:52 Message: Logged In: YES user_id=366566 Doc, NEWS, and test_socket patch attached. I didn't notice any inet_aton/inet_ntoa tests in the module so I added a couple for those as well (I excluded a test for inet_ntoa('255.255.255.255') ;) Also included are a couple IPv6 tests. I'm not sure if these are appropriate, since many systems may still lack the required support for them to pass. I'll leave it up to you to decide whether they should be commented out or removed or whatever. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-31 14:17 Message: Logged In: YES user_id=21627 I agree that such a change should be added. Neal, you have given this patch more attention than I did - please check it in when you consider it complete. I just like to point out that it is missing documentation changes (libsocket.tex), a NEWS entry, and a test case. kuran, please provide those as a single patch file. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-31 01:11 Message: Logged In: YES user_id=33168 ISTM that in socket_inet_ntop() you need to verify the size of the packed value passed in. If the user passes an empty string, inet_ntop() could read beyond the buffer passed in, potentially causing a core dump. The checks could be something like this: if (af == AF_INET && len != sizeof(struct in_addr)) else if (af == AF_INET6 && len != sizeof(struct in6_addr)) Do this make sense? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-27 16:39 Message: Logged In: YES user_id=366566 The use case I have for it at the moment is a DNS server (Twisted.names). inet_pton allows me to handle IPv6 addresses, so it allows me to support AAAA and A6 records. I believe an IPv6 capable socks proxy would find this useful as well. Basically, low level network stuff. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-27 11:23 Message: Logged In: YES user_id=21627 What is the rationale for providing this functionality? ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-26 19:32 Message: Logged In: YES user_id=366566 Ooops, I made two, and uploaded the wrong one >:O Sorry. Dunno if it's still helpful, but here's the unified diff. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-26 19:10 Message: Logged In: YES user_id=33168 Next time, please use context or unified diff. -c or -u option to cvs diff: cvs diff -c ... ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2002-12-24 22:05 Message: Logged In: YES user_id=366566 Sourceforge decided not to attach the file the first time... Here it is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658327&group_id=5470 From noreply@sourceforge.net Sat Mar 8 19:15:56 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 11:15:56 -0800 Subject: [Patches] [ python-Patches-700047 ] unicode object leaks refcount on resizing Message-ID: Patches item #700047, was opened at 2003-03-09 04:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700047&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: unicode object leaks refcount on resizing Initial Comment: This code duplicates the situation: static PyObject * leaktest(PyObject *self, PyObject *args) { PyObject *u; u = PyUnicode_FromUnicode(NULL, 1); if (u == NULL) return NULL; if (PyUnicode_Resize(&u, 0) == -1) return NULL; return u; 1 } ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700047&group_id=5470 From noreply@sourceforge.net Sat Mar 8 20:02:42 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 12:02:42 -0800 Subject: [Patches] [ python-Patches-684677 ] Allow freeze to exclude implicits Message-ID: Patches item #684677, was opened at 2003-02-11 16:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=684677&group_id=5470 Category: Demos and tools Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Lawrence Hudson (lhudson) Assigned to: Just van Rossum (jvr) Summary: Allow freeze to exclude implicits Initial Comment: Freeze always freezes site and exceptions. This patch allows these implicit modules to be excluded using the -x switch. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-08 21:02 Message: Logged In: YES user_id=92689 Applied, it's in rev. 1.43 of freeze.py. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-16 19:36 Message: Logged In: YES user_id=92689 The patch looks good, I'll have a closer look later. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=684677&group_id=5470 From noreply@sourceforge.net Sun Mar 9 05:46:02 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 21:46:02 -0800 Subject: [Patches] [ python-Patches-698520 ] Iterator for urllib.URLOpener Message-ID: Patches item #698520, was opened at 2003-03-06 01:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Raymond Hettinger (rhettinger) Summary: Iterator for urllib.URLOpener Initial Comment: 4 line patch to give urllib.URLOpener an iterator. Follows design of module and adds methods only if the file object used internally has __iter__ and adds 'next' only if __iter__ was added. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-09 00:46 Message: Logged In: YES user_id=80475 Committed as: Lib/urllib.py 1.155 Misc/NEWS 1.693 Doc/lib/liburllib.tex 1.45 ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-07 18:58 Message: Logged In: YES user_id=357491 OK, the new patch has the quote fix. I also added a single line to the urllib doc saying that it supports the iterator protocol. I don't know how the naming works for the \ref{} tex directive so I didn't put that in for referencing the iterator type although I suspect it wouldn't hurt. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 18:17 Message: Logged In: YES user_id=80475 That's fine. Go ahead and load the patch without the tests. Keep it on your todo list. It would be nice to have some good PyUnit tests for this module. Assign it to me when it's ready and I'll load it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-07 16:18 Message: Logged In: YES user_id=357491 The quotes thing was just a slip-up. t' fixed in my local copy and thus it will show up when I upload another patch. I will write up patches to the docs, although the docs guarantee certain methods that are actually conditionally added to the object; should I go ahead and just change the docs to reflect this or rip out the conditionality of the adding of the methods since the file object, if using a socket, is coming from socket.makefile() (I think; urllib seems to be from the 1.5 days and thus is using httplib.HTTP() and thus had to read the code)? I will also come up with a news item to be pasted into Misc/NEWS by the person who checks this in. As for the test, though, test_urllib only tests quote(). The module itself has some tests that can be run when the module is __main__, but all it does is fetch various pages and print the output; nothing really there that wouldn't be caught from people using it day-to-day. In other words there is no good place to put a test since there basically are no tests for this part of the module. =) Yes, I could fix this, but that would be a completely separate patch since the quote() tests are not even a PyUnit testing suite. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 01:22 Message: Logged In: YES user_id=80475 Looks good. Tests out okay. Use double quotes throughout. Consider adding a news item, docs, and a test. Assign back to me when you think it's ready to go. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698520&group_id=5470 From noreply@sourceforge.net Sun Mar 9 07:23:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 23:23:31 -0800 Subject: [Patches] [ python-Patches-667730 ] More DictMixin Message-ID: Patches item #667730, was opened at 2003-01-14 08:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Raymond Hettinger (rhettinger) Summary: More DictMixin Initial Comment: This patch is intended to provide a more consistent implementation for the various dictionary like objects of the standard library. test_userdict has been rewritten, it now use unittest and define a test-case wich allow to check for conformity with the dictionary protocol. test_shelve and test_weakref have been rewritten to use the test_userdict test-case. test_os has been extended: a new test case check for environ object conformity to the dictionary protocol. The patch modify the UserDict module: * The doc says that __contains__ should be one of the methods to redefine for better efficiency but the implementation make __contains__ dependent of has_key definition. The patch reverse methods dependencies. * Change iterkey = __iter__ to def iterkey(self): return self.__iter__() to make iterkey able to use overiden __iter__ methods. * I have also a added __init__, copy and __repr__ methods to DictMixin. * The UserDict.UserDict class is a subclass of DictMixin, this allow to simplify UserDict implementation. The patch is rather conservative since a lot of methods definition could still be removed from UserDict. In the weakref module, the patch make WeakValueDictionnary and WeakKeyDictionnary subclasses of UserDict.DictMixin. It also use nested scopes, the new generators syntax for iterator methods and rewrite WeakKeyDictionnary.__delitem__ . All of this allow to decrease the module size by 50%. In the shelve module, the patch add a copy() method which return a dictionary with the keys and values of the database. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-09 02:23 Message: Logged In: YES user_id=80475 Accepted patch. Made the suggested fix-ups. Fixed spelling. Replace _tested_class method with an equivalent class variable. Applied as: Lib/weakref.py 1.19 Lib/test/test_userdict.py 1.14 Lib/test/test_os.py 1.14 Lib/test/test_shelve.py 1.3 Lib/test/test_weakref.py 1.22 ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 00:52 Message: Logged In: YES user_id=80475 The patch looks good. Please make two adjustments and re-submit. 1) Change the test_func docstrings to comment blocks. If a docstring is present, test support will print them in the summary instead of the test name. 2) Change the logic for mapping.pop() to accommodate the new default argument option which was added yesterday. The format is m.pop(key[, default]). ---------------------------------------------------------------------- Comment By: Sebastien Keim (s_keim) Date: 2003-03-03 10:27 Message: Logged In: YES user_id=498191 I have downloaded a new version of the patch updated to Python2.3a2 I hope to have removed all the stuff which could break backward compatibility since the new proposed patch contain now only the testing stuff (well, almost since I have also added a pop method to the weak dictionary classes to make them compatible with the test case). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-15 21:50 Message: Logged In: YES user_id=80475 Also, +1 on consolidating the test cases though it should be done after any other changes to the files so we can make sure that nothing got broken. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-15 21:35 Message: Logged In: YES user_id=80475 * UserDict.UserDict should not change. As Martin pointed- out, inheriting from object changes the semantics in a non- backward compatible way. Also, the class is efficiently implemented in terms an internal dictionary and would be slowed down by the nest of calls in Mixin. Also, I think the code in incorrect in defining __iter__, there was a reason it was pulled out into a separate subclass -- that was done in Py2.2. and is not an easily reversible decision. * -0 on the changes to has_key() and __contains__(). has_key() was put at a lower level than __contains__ because the older dict-style interfaces all define has_key. * +1 for changing iterkeys() to a full definition (and +1 for doing the same for __iter__()). Sabastien is correct is pointing out the advantages for propagating an overridden method. * -1 for altering repr() implementation. The current approach is shorter, cleaner, and faster. * -1 for adding __nonzero__(). Even dictionaries don't implement this method; they let len() do the talking. * -1 for adding __init__() and copy(). Both need to make assumptions about the order and number of parameters in the constructor of the class using the mixin. I think they are rarely helpful and are sometime harmful in introducing surprising, hard-to-find errors. People who need an init() or copy() can code them more cleanly and directly in the extending class. Also, I don't think the code is correct since DictMixin will be a base class, the use of super() is not what is wanted here -- *if* you were going to do this, try something like self.__class__(). Further, adding these methods violates my original intent for this class which was to extrapolate four basic mapping methods into a full mapping interface. It was not intended as a stand-alone class. Also, copy() cannot guarantee that it is copying all the relevant data for the sub-class and that violates the definition of what copy() is supposed to do. If something like this were attempted, it should be its own mixin (automatically adding copy support to any class) and it should be rather sophisticated about how to perfectly replicate itself (not easily done if the underlying data is in a file, database, or in a distributed app). * +0 on changing weakdicts provided it is done minimally and carefully with attention to leaving semantics unchanged and not slowing performance. The advantage goes beyond consistency, it removes code duplication, keeps well thought-out logic in one place, and provides an automatic interface update from DictMixin if the dictionary interface ever sprouts another method. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-14 16:43 Message: Logged In: YES user_id=21627 This patch breaks backwards compatibility. UserDict is an oldstyle class on purpose, since changing it to a newstyle class will certainly break the compatibility in subtle ways (e.g. by changing what type(userdictinstance) is). Unless you can bring forward a better rationale than consistency, this patch will be rejected. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667730&group_id=5470 From noreply@sourceforge.net Sun Mar 9 07:42:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 23:42:58 -0800 Subject: [Patches] [ python-Patches-700047 ] unicode object leaks refcount on resizing Message-ID: Patches item #700047, was opened at 2003-03-08 14:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700047&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: unicode object leaks refcount on resizing Initial Comment: This code duplicates the situation: static PyObject * leaktest(PyObject *self, PyObject *args) { PyObject *u; u = PyUnicode_FromUnicode(NULL, 1); if (u == NULL) return NULL; if (PyUnicode_Resize(&u, 0) == -1) return NULL; return u; 1 } ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-09 02:42 Message: Logged In: YES user_id=80475 Applied patch as: Objects/unicodeobject.c 2.184 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700047&group_id=5470 From noreply@sourceforge.net Sun Mar 9 07:56:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 23:56:49 -0800 Subject: [Patches] [ python-Patches-691928 ] Use datetime in _strptime Message-ID: Patches item #691928, was opened at 2003-02-23 19:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Use datetime in _strptime Initial Comment: To prevent code duplication, I patched _strptime to use datetime's date object to do Julian day, Gregorian, and day of the week calculations (Tim's code has to be more reliable than mine =). Patch also includes new regression tests to test results and calculation gets triggered. Very minor comment changes and my contact email are also changed. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-09 02:56 Message: Logged In: YES user_id=80475 Applied patch as: Lib/_strptime.py 1.13 Lib/test/test_strptime.py 1.10 ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-03 16:14 Message: Logged In: YES user_id=357491 Response to meta comment - I would normally delete it, Skip, but last time I tried I was told I didn't have the proper rights to do it. Unless SF has changed their setup to allow patch creators to manage the files regardless of whether they have CVS access I can't. Response to comment comment - The reason I am doing this is that I want to make sure that the returned time tuple is a valid date. If strptime is going to have default values I want those values to lead to a valid time that does not require someone to have to do more processing or wonder whether it is valid. Now currently the docs say you can't expect anything back in the time tuple but what was in the data string, so doing this does not go against the docs. But if strptime becomes the only strptime implementation, then I will write a doc patch to make the docs say that all returned time tuples will be valid dates. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2003-03-03 10:03 Message: Logged In: YES user_id=44345 Meta comment - I think that when uploading successive patches it's useful to either name them differently or delete the prior one to avoid confusion. In this case it's not a big deal, especially since the submission dates are different, but after a few revisions it can sometimes be a challenge to figure out which patch should be downloaded. Comment comment - Unless there's some evidence the elided functions have been used, I suspect it best to just let people use the relevant datetime functions. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-02-25 16:51 Message: Logged In: YES user_id=357491 Only in the module (which was removed). None of the helper functions have ever been publicly advertised (although I think the locale date info might be helpful in locale; MvL wasn't interested, though). I uploaded a new diff that removes one more line that I forgot to remove when I eliminated the ability to pass in a regex object. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-23 19:56 Message: Logged In: YES user_id=33168 Brett, is there any doc for the functions that were removed? firstjulian, gregorian, julianday, dayofweek Otherwise, the patch seemed fine (but I didn't look that closely). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 From noreply@sourceforge.net Sun Mar 9 07:57:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 08 Mar 2003 23:57:09 -0800 Subject: [Patches] [ python-Patches-691928 ] Use datetime in _strptime Message-ID: Patches item #691928, was opened at 2003-02-23 19:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Use datetime in _strptime Initial Comment: To prevent code duplication, I patched _strptime to use datetime's date object to do Julian day, Gregorian, and day of the week calculations (Tim's code has to be more reliable than mine =). Patch also includes new regression tests to test results and calculation gets triggered. Very minor comment changes and my contact email are also changed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-09 02:56 Message: Logged In: YES user_id=80475 Applied patch as: Lib/_strptime.py 1.13 Lib/test/test_strptime.py 1.10 ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-03 16:14 Message: Logged In: YES user_id=357491 Response to meta comment - I would normally delete it, Skip, but last time I tried I was told I didn't have the proper rights to do it. Unless SF has changed their setup to allow patch creators to manage the files regardless of whether they have CVS access I can't. Response to comment comment - The reason I am doing this is that I want to make sure that the returned time tuple is a valid date. If strptime is going to have default values I want those values to lead to a valid time that does not require someone to have to do more processing or wonder whether it is valid. Now currently the docs say you can't expect anything back in the time tuple but what was in the data string, so doing this does not go against the docs. But if strptime becomes the only strptime implementation, then I will write a doc patch to make the docs say that all returned time tuples will be valid dates. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2003-03-03 10:03 Message: Logged In: YES user_id=44345 Meta comment - I think that when uploading successive patches it's useful to either name them differently or delete the prior one to avoid confusion. In this case it's not a big deal, especially since the submission dates are different, but after a few revisions it can sometimes be a challenge to figure out which patch should be downloaded. Comment comment - Unless there's some evidence the elided functions have been used, I suspect it best to just let people use the relevant datetime functions. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-02-25 16:51 Message: Logged In: YES user_id=357491 Only in the module (which was removed). None of the helper functions have ever been publicly advertised (although I think the locale date info might be helpful in locale; MvL wasn't interested, though). I uploaded a new diff that removes one more line that I forgot to remove when I eliminated the ability to pass in a regex object. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-23 19:56 Message: Logged In: YES user_id=33168 Brett, is there any doc for the functions that were removed? firstjulian, gregorian, julianday, dayofweek Otherwise, the patch seemed fine (but I didn't look that closely). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=691928&group_id=5470 From noreply@sourceforge.net Mon Mar 10 14:22:23 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 06:22:23 -0800 Subject: [Patches] [ python-Patches-700839 ] various gettext fixes Message-ID: Patches item #700839, was opened at 2003-03-10 15:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700839&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Juan David Ibáñez Palomar (jdavid) Assigned to: Nobody/Anonymous (nobody) Summary: various gettext fixes Initial Comment: >From a message from Bruno Haible [1] here there is a patch that fixes several gettext bugs: - The ! operator was treated incorrectly if not followed by an space. - Now unbalanced parentheses in a plural forms expression give a more meaningful error. - Provide a plural forms expression default as libintl and msgfmt do. - Don't test that the header entry starts with 'Project-Id-Version:', the PO format does not require it. [1] http://mail.python.org/pipermail/i18n-sig/2003-February/001543.html ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700839&group_id=5470 From noreply@sourceforge.net Mon Mar 10 15:08:24 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 07:08:24 -0800 Subject: [Patches] [ python-Patches-700858 ] Replacing and deleting files in a zipfile archive. Message-ID: Patches item #700858, was opened at 2003-03-11 01:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Nev Delap (nevdelap) Assigned to: Nobody/Anonymous (nobody) Summary: Replacing and deleting files in a zipfile archive. Initial Comment: Addition of replace, replacestr and delete methods into zipfile.py. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 From noreply@sourceforge.net Mon Mar 10 15:14:51 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 07:14:51 -0800 Subject: [Patches] [ python-Patches-700858 ] Replacing and deleting files in a zipfile archive. Message-ID: Patches item #700858, was opened at 2003-03-11 01:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Nev Delap (nevdelap) Assigned to: Nobody/Anonymous (nobody) Summary: Replacing and deleting files in a zipfile archive. Initial Comment: Addition of replace, replacestr and delete methods into zipfile.py. ---------------------------------------------------------------------- >Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:14 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 From noreply@sourceforge.net Mon Mar 10 15:15:10 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 07:15:10 -0800 Subject: [Patches] [ python-Patches-700858 ] Replacing and deleting files in a zipfile archive. Message-ID: Patches item #700858, was opened at 2003-03-11 01:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Nev Delap (nevdelap) Assigned to: Nobody/Anonymous (nobody) Summary: Replacing and deleting files in a zipfile archive. Initial Comment: Addition of replace, replacestr and delete methods into zipfile.py. ---------------------------------------------------------------------- >Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:15 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:14 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 From noreply@sourceforge.net Mon Mar 10 15:16:57 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 07:16:57 -0800 Subject: [Patches] [ python-Patches-700858 ] Replacing and deleting files in a zipfile archive. Message-ID: Patches item #700858, was opened at 2003-03-11 01:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Nev Delap (nevdelap) Assigned to: Nobody/Anonymous (nobody) Summary: Replacing and deleting files in a zipfile archive. Initial Comment: Addition of replace, replacestr and delete methods into zipfile.py. ---------------------------------------------------------------------- >Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:16 Message: Logged In: YES user_id=730416 . ---------------------------------------------------------------------- Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:15 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:14 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 From noreply@sourceforge.net Mon Mar 10 15:19:26 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 07:19:26 -0800 Subject: [Patches] [ python-Patches-700858 ] Replacing and deleting files in a zipfile archive. Message-ID: Patches item #700858, was opened at 2003-03-11 01:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Nev Delap (nevdelap) Assigned to: Nobody/Anonymous (nobody) Summary: Replacing and deleting files in a zipfile archive. Initial Comment: Addition of replace, replacestr and delete methods into zipfile.py. ---------------------------------------------------------------------- >Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:19 Message: Logged In: YES user_id=730416 OK, so after refreshing it finally decided to show the files I'd added. ---------------------------------------------------------------------- Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:16 Message: Logged In: YES user_id=730416 . ---------------------------------------------------------------------- Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:15 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- Comment By: Nev Delap (nevdelap) Date: 2003-03-11 01:14 Message: Logged In: YES user_id=730416 The file upload say "Successful" but the file isn't listed!? I've tried it several times and yes I've checked the checkbox. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700858&group_id=5470 From noreply@sourceforge.net Mon Mar 10 15:28:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 07:28:48 -0800 Subject: [Patches] [ python-Patches-649762 ] Fix: asynchat.py: endless loop Message-ID: Patches item #649762, was opened at 2002-12-06 16:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=649762&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Bernhard Reiter (ber) Assigned to: A.M. Kuchling (akuchling) Summary: Fix: asynchat.py: endless loop Initial Comment: Patch against asynchat.py revision 1.19 in Python SF CVS. Fixes endless loop when terminator='' is used. Diagnosis: If we do not catch the empty string no buffer will be consumed lin line 134 and the while loop does not terminate. Cure: Go back to old behaviour and call collect everything with '' and None. Background: Especially annoying because early versions (rev 1.1, coming with python1.5) did not have this bug and the comment in set_terminator() says that strings of all length are okay (among other things). The bug was introduced in rev 1.2. Bernhard Reiter ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2003-03-10 10:28 Message: Logged In: YES user_id=11375 A fix has been checked in as rev.1.21 of asynchat.py in the CVS tree. Thanks for your help! ---------------------------------------------------------------------- Comment By: Bernhard Reiter (ber) Date: 2003-02-03 14:39 Message: Logged In: YES user_id=113859 Yes, of course. I stopped experimenting with numeric and empty string terminators after hitting this bug, so I uploaded the flawed fix. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2003-02-03 14:34 Message: Logged In: YES user_id=11375 Surely in your patched version, the code should be 'if not terminator: ...'. Otherwise the patch reverses the sense of the test. ---------------------------------------------------------------------- Comment By: Bernhard Reiter (ber) Date: 2002-12-06 17:37 Message: Logged In: YES user_id=113859 The patch also fixes the terminator=0 problem which is similiar. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=649762&group_id=5470 From noreply@sourceforge.net Mon Mar 10 16:14:05 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 08:14:05 -0800 Subject: [Patches] [ python-Patches-700839 ] various gettext fixes Message-ID: Patches item #700839, was opened at 2003-03-10 15:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700839&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Juan David Ibáñez Palomar (jdavid) Assigned to: Nobody/Anonymous (nobody) Summary: various gettext fixes Initial Comment: >From a message from Bruno Haible [1] here there is a patch that fixes several gettext bugs: - The ! operator was treated incorrectly if not followed by an space. - Now unbalanced parentheses in a plural forms expression give a more meaningful error. - Provide a plural forms expression default as libintl and msgfmt do. - Don't test that the header entry starts with 'Project-Id-Version:', the PO format does not require it. [1] http://mail.python.org/pipermail/i18n-sig/2003-February/001543.html ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-10 17:14 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as gettext.py 1.17. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=700839&group_id=5470 From noreply@sourceforge.net Mon Mar 10 17:02:22 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 09:02:22 -0800 Subject: [Patches] [ python-Patches-698505 ] docs for hotshot module Message-ID: Patches item #698505, was opened at 2003-03-06 00:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 Category: Documentation Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) >Assigned to: Anthony Baxter (anthonybaxter) Summary: docs for hotshot module Initial Comment: The attached provides documentation for the hotshot module. Assigning to Fred for review. ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2003-03-10 12:02 Message: Logged In: YES user_id=3066 Please commit. Further changes can be made in CVS. Thanks! ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-07 01:43 Message: Logged In: YES user_id=80475 The TeX markup checks out fine. Consider documenting lineevents and linetimings which are exposed upon: import hotshot. For the example, consider adding a comment line at the beginning with hints that the example produces large files and takes a long time to run. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2003-03-06 00:39 Message: Logged In: YES user_id=29957 stupid sourceforge tracker. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=698505&group_id=5470 From noreply@sourceforge.net Mon Mar 10 17:52:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 09:52:39 -0800 Subject: [Patches] [ python-Patches-663369 ] (email) Escape backslashes in specialsre and escapesre Message-ID: Patches item #663369, was opened at 2003-01-06 17:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=663369&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Matthew Woodcraft (mhf) Assigned to: Barry A. Warsaw (bwarsaw) Summary: (email) Escape backslashes in specialsre and escapesre Initial Comment: (email/Utils.py) Escape backslashes in character classes in specialsre and escapesre. Patch against sourceforge CVS as of 2003-01-06 python/dist/src/Lib/email/Utils.py rev 1.21 python/dist/src/Lib/email/test/test_email.py rev 1.29 ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-10 12:52 Message: Logged In: YES user_id=12800 This patch doesn't look right. First, we're using raw strings so we don't need to escape backslashes. Second, why did you add backslashes around the word Silly in the test case? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=663369&group_id=5470 From noreply@sourceforge.net Mon Mar 10 18:49:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 10:49:32 -0800 Subject: [Patches] [ python-Patches-663369 ] (email) Escape backslashes in specialsre and escapesre Message-ID: Patches item #663369, was opened at 2003-01-06 22:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=663369&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Rejected Priority: 5 Submitted By: Matthew Woodcraft (mhf) Assigned to: Barry A. Warsaw (bwarsaw) Summary: (email) Escape backslashes in specialsre and escapesre Initial Comment: (email/Utils.py) Escape backslashes in character classes in specialsre and escapesre. Patch against sourceforge CVS as of 2003-01-06 python/dist/src/Lib/email/Utils.py rev 1.21 python/dist/src/Lib/email/test/test_email.py rev 1.29 ---------------------------------------------------------------------- >Comment By: Matthew Woodcraft (mhf) Date: 2003-03-10 18:49 Message: Logged In: YES user_id=57248 The backslashes need to be escaped, not for the Python string interpreter, but for the regular expression compiler -- backslashes in character classes need to be doubled in order to stand for themselves. Currently, the backslashes in the character classes are 'escaping' the following open parenthesis characters, and effectively being ignored. The change to the testcase is there in order to test for the bug being fixed: backslashes in quoted-strings must be escaped (rfc822 3.3 / rfc2822 3.2.5). ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-10 17:52 Message: Logged In: YES user_id=12800 This patch doesn't look right. First, we're using raw strings so we don't need to escape backslashes. Second, why did you add backslashes around the word Silly in the test case? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=663369&group_id=5470 From noreply@sourceforge.net Mon Mar 10 19:30:29 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 10 Mar 2003 11:30:29 -0800 Subject: [Patches] [ python-Patches-663369 ] (email) Escape backslashes in specialsre and escapesre Message-ID: Patches item #663369, was opened at 2003-01-06 17:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=663369&group_id=5470 Category: Library (Lib) Group: None Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Matthew Woodcraft (mhf) Assigned to: Barry A. Warsaw (bwarsaw) Summary: (email) Escape backslashes in specialsre and escapesre Initial Comment: (email/Utils.py) Escape backslashes in character classes in specialsre and escapesre. Patch against sourceforge CVS as of 2003-01-06 python/dist/src/Lib/email/Utils.py rev 1.21 python/dist/src/Lib/email/test/test_email.py rev 1.29 ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-10 14:30 Message: Logged In: YES user_id=12800 Gotcha, thanks. The unittest patch isn't right but I'll commit a correct one. ---------------------------------------------------------------------- Comment By: Matthew Woodcraft (mhf) Date: 2003-03-10 13:49 Message: Logged In: YES user_id=57248 The backslashes need to be escaped, not for the Python string interpreter, but for the regular expression compiler -- backslashes in character classes need to be doubled in order to stand for themselves. Currently, the backslashes in the character classes are 'escaping' the following open parenthesis characters, and effectively being ignored. The change to the testcase is there in order to test for the bug being fixed: backslashes in quoted-strings must be escaped (rfc822 3.3 / rfc2822 3.2.5). ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-10 12:52 Message: Logged In: YES user_id=12800 This patch doesn't look right. First, we're using raw strings so we don't need to escape backslashes. Second, why did you add backslashes around the word Silly in the test case? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=663369&group_id=5470 From noreply@sourceforge.net Tue Mar 11 08:08:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 00:08:34 -0800 Subject: [Patches] [ python-Patches-701395 ] Wrong prototype for PyUnicode_Splitlines on documentation Message-ID: Patches item #701395, was opened at 2003-03-11 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701395&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: Wrong prototype for PyUnicode_Splitlines on documentation Initial Comment: A mismatch of prototype and description between documentation and implementation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701395&group_id=5470 From noreply@sourceforge.net Tue Mar 11 09:18:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 01:18:31 -0800 Subject: [Patches] [ python-Patches-701395 ] Wrong prototype for PyUnicode_Splitlines on documentation Message-ID: Patches item #701395, was opened at 2003-03-11 09:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701395&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) >Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Wrong prototype for PyUnicode_Splitlines on documentation Initial Comment: A mismatch of prototype and description between documentation and implementation. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-11 10:18 Message: Logged In: YES user_id=38388 Looks good. Assigned to Fred. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701395&group_id=5470 From noreply@sourceforge.net Tue Mar 11 12:32:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 04:32:59 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 14:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Nobody/Anonymous (nobody) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Tue Mar 11 18:34:53 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 10:34:53 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 13:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Nobody/Anonymous (nobody) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 19:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Tue Mar 11 18:59:27 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 10:59:27 -0800 Subject: [Patches] [ python-Patches-701743 ] Reloading pseudo modules Message-ID: Patches item #701743, was opened at 2003-03-11 19:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Reloading pseudo modules Initial Comment: Python allows to put something that is not a module in sys.modules. Unfortunately reload() does not work wth such a pseudo module ("TypeError: reload() argument must be module" is raised). This patch changes Python/import.c::PyImport_ReloadModule() so that it works with anything that has a __name__ attribute that can be found in sys.modules.keys(). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 From noreply@sourceforge.net Tue Mar 11 19:15:24 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 11:15:24 -0800 Subject: [Patches] [ python-Patches-662807 ] Port tests to unittest Message-ID: Patches item #662807, was opened at 2003-01-05 21:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662807&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: Accepted Priority: 5 Submitted By: Walter Dörwald (doerwalter) >Assigned to: Raymond Hettinger (rhettinger) Summary: Port tests to unittest Initial Comment: This patch ports the three tests test_pow.py, test_charmapcodec.py and test_userdict.py to unittest. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 20:15 Message: Logged In: YES user_id=89016 Here's the next one: test___all__.py ported to PyUnit and updated. A better solution might be to replace __builtin__.__import__ in regrtest.py and test for the __all__ attribute there. Additionally this might allow us to check which modules are imported by regrtest.py and which are not and require additional tests. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-26 16:08 Message: Logged In: YES user_id=89016 Checked in as: Lib/test/test_ucn.py 1.12 Lib/test/test_unicodedata.py 1.7 Lib/test/output/test_ucn delete Lib/test/output/test_unicodedata delete ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-26 14:42 Message: Logged In: YES user_id=38388 test_ucn and test_unicodedata look OK. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-25 18:53 Message: Logged In: YES user_id=89016 OK, here are the next few ports: test_ucn and test_unicodedata. I'm not actually sure, whether changing test_unicodedata (which uses the comparison of generated output with expected output) is a good thing, as now updates to the database require manual changes. I've added a few error checks which increase coverage in unicodedata.c from 87% to 95%. Marc-André can you check if this is OK? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-21 14:05 Message: Logged In: YES user_id=89016 Checked in as: Lib/test/string_tests.py 1.27 Lib/test/test_str.py 1.1 Lib/test/test_string.py 1.24 Lib/test/test_unicode.py 1.79 Lib/test/test_userstring.py 1.10 Lib/test/output/test_string delete I've removed the sets import and renamed the mixin tests to contain the relevant class/module names (e.g. MixinStrStringUserStringTest) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-02-21 04:39 Message: Logged In: YES user_id=80475 * test_string.py imports sets but does not use it. * the names of the mixin classes could possibly be made clearer so I won't have to search into the comments to find-out which mixins are appropriate for each class. Overall, it looks like a nice factoring job and ought to go a long ways toward keeping these guys in sync in the future. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-17 19:29 Message: Logged In: YES user_id=89016 Here is the next bunch of ports: the string tests have been ported to PyUnit and made as reusable as possible. Tests are now shared between str, unicode, UserString and the string module. As a result of reusing a part of the unicode tests for str, the coverage in stringobject.c goes from 83% to 86%. Furthermore it should help keep the API consistent between str and unicode (Example: "%c" % 0xffffffff raises OverflowError, u"%c" % 0xffffffff raises ValueError) Raymond can look look through the scripts and check that everything is OK? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-16 10:33 Message: Logged In: YES user_id=89016 I'm currently working on a PyUnit port of the string tests (i.e. str, unicode, UserString and the string module). Uploading the result to this patch would be easier, as it already has a establsihed audience: But I can open a new patch for that if you want. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-14 21:11 Message: Logged In: YES user_id=33168 Walter, can this patch be closed now? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-14 12:30 Message: Logged In: YES user_id=89016 Checked in as: Lib/test/output/test_charmapcodec delete Lib/test/test_charmapcodec.py 1.6 ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-14 09:52 Message: Logged In: YES user_id=38388 test_charmapcodec looks OK. Just remove the DOS-lineends before checking it in. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-13 19:16 Message: Logged In: YES user_id=89016 OK, checked in as test_userlist.py 1.7. Assigned back to MAL for the review of test_charmapcodec. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-13 19:08 Message: Logged In: YES user_id=6380 Walter, feel free to check in test_userlist.py! ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-13 19:02 Message: Logged In: YES user_id=89016 Here's another one: test_userlist has been ported to PyUnit and a few tests have been added to increase coverage. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-13 04:12 Message: Logged In: YES user_id=33168 MAL, could you look at the test_charmapcodec.py? I think that's the only file outstanding from this patch. It's a pretty straightforward test. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-04 00:13 Message: Logged In: YES user_id=89016 OK, test_sys.py is checked in. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-03 23:56 Message: Logged In: YES user_id=6380 I think you can check this in -- if it fails with Jython, Finn or Samuele will quickly patch it. :-) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-03 23:44 Message: Logged In: YES user_id=89016 OK, here's a new test_sys.py > test_sys.py: > > - I agree that it's not worth testing the code > paths that will invoke a custom __displayhook__ or > __excepthook__, but I regret it nevertheless. :-) > maybe this deserves a comment? Testing a custom displayhook is now done (via compile(..., "single")/exec). Testing a custom excepthook seems to be trickier. This could probably be done by calling the interpreter recursively via os.system() or os.popen(). I've added a comment for now that this isn't tested. Unfortunately this leaves a large block in Python/pythonrun.c uncovered. > - sys.exit() should also be callable with a string OK, done. > - you could check that the value of the SystemExit exception > has the right exit code Done. - Have you checked this with Jython? I don't know if it implements all of these; in particular I doubt it has getrefcount(). I haven't tested Jython yet, but I guess test_sys.py will have to many many exceptions for Jython. I'll try this tomorrow. - I presume you've tested this on Windows? Linux & Windows ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-03 22:10 Message: Logged In: YES user_id=6380 test_sys.py: - I agree that it's not worth testing the code paths that will invoke a custom __displayhook__ or __excepthook__, but I regret it nevertheless. :-) maybe this deserves a comment? - sys.exit() should also be callable with a string - you could check that the value of the SystemExit exception has the right exit code - Have you checked this with Jython? I don't know if it implements all of these; in particular I doubt it has getrefcount(). - I presume you've tested this on Windows? Sorry, I can't help you with charmapcodec ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-02-03 21:36 Message: Logged In: YES user_id=89016 Here's a new one: test_sys.py tests Python/sysmodule.c. Coverage goes from 68% to 77%. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-19 15:46 Message: Logged In: YES user_id=80475 All are approved except test_charmapcodec.py -- someone else should look at that one. Be sure to follow GvR's advice and replace assertEquals with assertEqual. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-01-16 21:47 Message: Logged In: YES user_id=89016 test_unicode is ported and enhanced (coverage goes from 80.81% to 85.05%) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-01-10 18:17 Message: Logged In: YES user_id=89016 > In general, don't do tests that hardwire implementation details So should we remove self.assertEquals(reduce(42, "1"), "1") self.assertEquals(reduce(42, "", "1"), "1") from test_filter? BTW, you should look at test_builtin first, as the others are still simply ports to PyUnit. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-01-10 18:03 Message: Logged In: YES user_id=80475 Good to hear the news on increasing the coverage. In general, don't do tests that hardwire implementation details. Test it if it is a documented variable, exposed through __all__, is a key constact (like the magic numbers in random.py), or a variable that a module user is likely to be relying upon. Otherwise, no -- it should be possible to improve an implementation without crashing the suite. I'll try to review a few of these over the next few days. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-01-10 17:53 Message: Logged In: YES user_id=89016 test_builtin.py is now updated to test more error situations. This increases the coverage of bltinmodule.c from 75.13% to 92.20%, and it actually revealed one or two potential bugs: http://www.python.org/sf/665761 and http://www.python.org/sf/665835 I'm not 100% sure that test_intern() and test_execfile() do the right thing. I'm not sure, whether the test script should check for undocumented implementation artefacts, like: a = 1 self.assert_(min(a, 1L) is a) but in this way at least we get notified if something is changed unintentionally. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-01-08 20:05 Message: Logged In: YES user_id=89016 test_b1 and test_b2 are combined into test_builtin now ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-01-08 15:03 Message: Logged In: YES user_id=6380 Two random suggestions: - a blank line before each method, even trivial ones, even the first one - use assertEqual, not assertEquals BTW, I see you've picked up on the convention that unit test methods should not have doc strings. Good! (But they may have comments.) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-01-07 17:37 Message: Logged In: YES user_id=89016 test_b1.py has been ported too. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-01-05 21:56 Message: Logged In: YES user_id=89016 The patch is hard to read, so I'll upload all three test scripts. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662807&group_id=5470 From noreply@sourceforge.net Wed Mar 12 01:38:00 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 17:38:00 -0800 Subject: [Patches] [ python-Patches-701907 ] More use of fast_next_opcode Message-ID: Patches item #701907, was opened at 2003-03-11 20:38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701907&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: More use of fast_next_opcode Initial Comment: Applies "goto fast_next_opcode" instead of continue in op codes that don't make intervening C calls. Makes the common tiny quick opcodes just a little quicker. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701907&group_id=5470 From noreply@sourceforge.net Wed Mar 12 01:41:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 17:41:48 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 07:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Nobody/Anonymous (nobody) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 13:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Wed Mar 12 01:44:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 17:44:41 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 14:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Raymond Hettinger (rhettinger) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 17:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Wed Mar 12 02:35:08 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 11 Mar 2003 18:35:08 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 19:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Raymond Hettinger (rhettinger) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-12 02:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-12 01:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 22:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Wed Mar 12 08:46:27 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 12 Mar 2003 00:46:27 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 14:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Nobody/Anonymous (nobody) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- >Comment By: Christos Georgiou (tzot) Date: 2003-03-12 10:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-12 03:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 20:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Wed Mar 12 20:14:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 12 Mar 2003 12:14:47 -0800 Subject: [Patches] [ python-Patches-702463 ] AE Enum and Attribute support fixes Message-ID: Patches item #702463, was opened at 2003-03-12 11:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702463&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Enum and Attribute support fixes Initial Comment: This patch contains two somewhat unrelated minor patches to python's AppleEvent infrastructure. Details: 1) Currently, Enum parameters are encoded as four character strings in the actual AppleEvent. The vast majority of applications and events seem to be able to handle this, but to be correct and support those applications that don't, we should simply wrap the four-character-code string with an Enum before encoding the event. This is the fix at the bottom of the attached patch. 2) Currently, AppleEvent attributes which may be passed to any of the methods generated by gensuitemodule are encoded using AEPutParamDesc. This has probably not been an issue because there are almost no cases where Python code wants to attach attributes to an AppleEvent. However, to be correct, attributes should be attached to an AppleEvent using AEPutAttributeDesc. Donovan ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702463&group_id=5470 From noreply@sourceforge.net Wed Mar 12 20:44:26 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 12 Mar 2003 12:44:26 -0800 Subject: [Patches] [ python-Patches-702463 ] AE Enum and Attribute support fixes Message-ID: Patches item #702463, was opened at 2003-03-12 11:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702463&group_id=5470 Category: Macintosh Group: Python 2.3 >Status: Closed Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Enum and Attribute support fixes Initial Comment: This patch contains two somewhat unrelated minor patches to python's AppleEvent infrastructure. Details: 1) Currently, Enum parameters are encoded as four character strings in the actual AppleEvent. The vast majority of applications and events seem to be able to handle this, but to be correct and support those applications that don't, we should simply wrap the four-character-code string with an Enum before encoding the event. This is the fix at the bottom of the attached patch. 2) Currently, AppleEvent attributes which may be passed to any of the methods generated by gensuitemodule are encoded using AEPutParamDesc. This has probably not been an issue because there are almost no cases where Python code wants to attach attributes to an AppleEvent. However, to be correct, attributes should be attached to an AppleEvent using AEPutAttributeDesc. Donovan ---------------------------------------------------------------------- >Comment By: Donovan Preston (dsposx) Date: 2003-03-12 11:44 Message: Logged In: YES user_id=111050 I note that this has already been patched, sorry :( ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702463&group_id=5470 From noreply@sourceforge.net Thu Mar 13 00:07:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 12 Mar 2003 16:07:04 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-12 15:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Thu Mar 13 00:08:08 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 12 Mar 2003 16:08:08 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-12 15:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Thu Mar 13 00:08:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 12 Mar 2003 16:08:55 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-12 15:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Whoops. Have to click the checkbox. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Thu Mar 13 11:19:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 13 Mar 2003 03:19:17 -0800 Subject: [Patches] [ python-Patches-697939 ] optparse unit tests + fixes Message-ID: Patches item #697939, was opened at 2003-03-05 12:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697939&group_id=5470 Category: Tests Group: Python 2.3 >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: Johannes Gijsbers (jlgijsbers) Assigned to: Nobody/Anonymous (nobody) Summary: optparse unit tests + fixes Initial Comment: Here's a patch that mostly converts the tests from optik 1.4 to the unittest format and makes it usable in the Python library. I've also added some tests, of which five fail with current CVS: test_opt_string_empty test_opt_string_too_short test_opt_string_long_invalid test_opt_string_short_invalid test_help_long_opts_first I changed the following to fix the tests: * format_option_strings_short_first and format_option_strings_long_first have been merged into one function, format_options, to eliminate the almost complete duplication. To make this possible, short_first is now an attribute, which conveniently also eases changing short_first after instantiation. * _short_opts and _long_opts are set in the Option constructor, instead of in _check_option_strings, to prevent an AttributeError which would occur when no option strings were passed, making the "at least one option string must be supplied" OptionError useless. * Removed the check that would raise a RuntimeError in Option.__str__ when no option strings existed in _short_opts or _long_opts. A RuntimeError would be raised when an OptionError was raised in _set_opt_strings, because, quite logically, no option strings were set at that point. I'm not sure why the check was there, because _short_opts and _long_opts are only empty when instantation fails, or when somebody set those *internal* attributes to false. And the moment you start mucking with internal attributes, you're on your own. :) ---------------------------------------------------------------------- >Comment By: Johannes Gijsbers (jlgijsbers) Date: 2003-03-13 12:19 Message: Logged In: YES user_id=469548 I should have submitted a patch to the Optik code, according to Greg, I'll close this one and resubmit to him. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697939&group_id=5470 From noreply@sourceforge.net Thu Mar 13 11:19:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 13 Mar 2003 03:19:39 -0800 Subject: [Patches] [ python-Patches-697941 ] optparse OptionGroup docs Message-ID: Patches item #697941, was opened at 2003-03-05 12:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697941&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Johannes Gijsbers (jlgijsbers) >Assigned to: Greg Ward (gward) Summary: optparse OptionGroup docs Initial Comment: A small patch to add a bit about the new OptionGroup, added in Optik 1.4 and Python CVS but currently undocumented. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=697941&group_id=5470 From noreply@sourceforge.net Thu Mar 13 13:10:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 13 Mar 2003 05:10:31 -0800 Subject: [Patches] [ python-Patches-702933 ] Kill off docs for unsafe macros Message-ID: Patches item #702933, was opened at 2003-03-13 08:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702933&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: David Abrahams (david_abrahams) Assigned to: Nobody/Anonymous (nobody) Summary: Kill off docs for unsafe macros Initial Comment: I'll also attach the patch, but a message body is required: ========================================= ========================== RCS file: /cvsroot/python/python/dist/src/Doc/api/memory.tex, v retrieving revision 1.2 diff -w -u -r1.2 memory.tex --- memory.tex 6 Apr 2002 09:14:33 -0000 1.2 +++ memory.tex 13 Mar 2003 12:56:26 -0000 @@ -195,9 +195,7 @@ In addition to the functions aimed at handling raw memory blocks from the Python heap, objects in Python are allocated and released with \cfunction{PyObject_New()}, \cfunction {PyObject_NewVar()} and -\cfunction{PyObject_Del()}, or with their corresponding macros -\cfunction{PyObject_NEW()}, \cfunction {PyObject_NEW_VAR()} and -\cfunction{PyObject_DEL()}. +\cfunction{PyObject_Del()}. These will be explained in the next chapter on defining and implementing new object types in C. Index: newtypes.tex ========================================= ========================== RCS file: /cvsroot/python/python/dist/src/Doc/api/newtypes.te x,v retrieving revision 1.21 diff -w -u -r1.21 newtypes.tex --- newtypes.tex 10 Feb 2003 19:18:21 -0000 1.21 +++ newtypes.tex 13 Mar 2003 12:56:27 -0000 @@ -62,23 +62,6 @@ after this call as the memory is no longer a valid Python object. \end{cfuncdesc} -\begin{cfuncdesc}{\var{TYPE}*}{PyObject_NEW}{TYPE, PyTypeObject *type} - Macro version of \cfunction{PyObject_New()}, to gain performance at - the expense of safety. This does not check \var{type} for a \NULL{} - value. -\end{cfuncdesc} - -\begin{cfuncdesc}{\var{TYPE}*}{PyObject_NEW_VAR} {TYPE, PyTypeObject *type, - int size} - Macro version of \cfunction{PyObject_NewVar()}, to gain performance - at the expense of safety. This does not check \var {type} for a - \NULL{} value. -\end{cfuncdesc} - -\begin{cfuncdesc}{void}{PyObject_DEL}{PyObject *op} - Macro version of \cfunction{PyObject_Del()}. -\end{cfuncdesc} - \begin{cfuncdesc}{PyObject*}{Py_InitModule}{char *name, PyMethodDef *methods} Create a new module object based on a name and table of functions, ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702933&group_id=5470 From noreply@sourceforge.net Thu Mar 13 13:11:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 13 Mar 2003 05:11:31 -0800 Subject: [Patches] [ python-Patches-702933 ] Kill off docs for unsafe macros Message-ID: Patches item #702933, was opened at 2003-03-13 08:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702933&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: David Abrahams (david_abrahams) >Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Kill off docs for unsafe macros Initial Comment: I'll also attach the patch, but a message body is required: ========================================= ========================== RCS file: /cvsroot/python/python/dist/src/Doc/api/memory.tex, v retrieving revision 1.2 diff -w -u -r1.2 memory.tex --- memory.tex 6 Apr 2002 09:14:33 -0000 1.2 +++ memory.tex 13 Mar 2003 12:56:26 -0000 @@ -195,9 +195,7 @@ In addition to the functions aimed at handling raw memory blocks from the Python heap, objects in Python are allocated and released with \cfunction{PyObject_New()}, \cfunction {PyObject_NewVar()} and -\cfunction{PyObject_Del()}, or with their corresponding macros -\cfunction{PyObject_NEW()}, \cfunction {PyObject_NEW_VAR()} and -\cfunction{PyObject_DEL()}. +\cfunction{PyObject_Del()}. These will be explained in the next chapter on defining and implementing new object types in C. Index: newtypes.tex ========================================= ========================== RCS file: /cvsroot/python/python/dist/src/Doc/api/newtypes.te x,v retrieving revision 1.21 diff -w -u -r1.21 newtypes.tex --- newtypes.tex 10 Feb 2003 19:18:21 -0000 1.21 +++ newtypes.tex 13 Mar 2003 12:56:27 -0000 @@ -62,23 +62,6 @@ after this call as the memory is no longer a valid Python object. \end{cfuncdesc} -\begin{cfuncdesc}{\var{TYPE}*}{PyObject_NEW}{TYPE, PyTypeObject *type} - Macro version of \cfunction{PyObject_New()}, to gain performance at - the expense of safety. This does not check \var{type} for a \NULL{} - value. -\end{cfuncdesc} - -\begin{cfuncdesc}{\var{TYPE}*}{PyObject_NEW_VAR} {TYPE, PyTypeObject *type, - int size} - Macro version of \cfunction{PyObject_NewVar()}, to gain performance - at the expense of safety. This does not check \var {type} for a - \NULL{} value. -\end{cfuncdesc} - -\begin{cfuncdesc}{void}{PyObject_DEL}{PyObject *op} - Macro version of \cfunction{PyObject_Del()}. -\end{cfuncdesc} - \begin{cfuncdesc}{PyObject*}{Py_InitModule}{char *name, PyMethodDef *methods} Create a new module object based on a name and table of functions, ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702933&group_id=5470 From noreply@sourceforge.net Thu Mar 13 21:40:57 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 13 Mar 2003 13:40:57 -0800 Subject: [Patches] [ python-Patches-701907 ] More use of fast_next_opcode Message-ID: Patches item #701907, was opened at 2003-03-12 02:38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701907&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: More use of fast_next_opcode Initial Comment: Applies "goto fast_next_opcode" instead of continue in op codes that don't make intervening C calls. Makes the common tiny quick opcodes just a little quicker. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-13 22:40 Message: Logged In: YES user_id=38388 Sorry, not much time to look at this. >From a quick scan, it looks OK. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701907&group_id=5470 From noreply@sourceforge.net Fri Mar 14 01:51:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 13 Mar 2003 17:51:32 -0800 Subject: [Patches] [ python-Patches-701907 ] More use of fast_next_opcode Message-ID: Patches item #701907, was opened at 2003-03-11 20:38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701907&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: More use of fast_next_opcode Initial Comment: Applies "goto fast_next_opcode" instead of continue in op codes that don't make intervening C calls. Makes the common tiny quick opcodes just a little quicker. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-13 20:51 Message: Logged In: YES user_id=80475 Thanks for the second look. It's a low risk patch, so I'll go ahead and load it. See ceval.c 2.354. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-13 16:40 Message: Logged In: YES user_id=38388 Sorry, not much time to look at this. >From a quick scan, it looks OK. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701907&group_id=5470 From noreply@sourceforge.net Fri Mar 14 08:18:44 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 00:18:44 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 17:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 15:19:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 07:19:21 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None >Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 16:35:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 08:35:21 -0800 Subject: [Patches] [ python-Patches-669683 ] HTMLParser -- allow comma in unquoted attribute values Message-ID: Patches item #669683, was opened at 2003-01-17 06:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=669683&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: j paulson (fantoozler) Assigned to: Fred L. Drake, Jr. (fdrake) >Summary: HTMLParser -- allow comma in unquoted attribute values Initial Comment: An HTML document in the wild had the tag: and HTMLParser was choking on the "," after the "175". By adding "," to the list of allowed characters in attribute values, HTMLParser accepts the document. ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2003-03-14 11:35 Message: Logged In: YES user_id=3066 The regression test does not appear to have been attached; I see four copies of essentially the same patch. That's OK though; I have enough to go on. I've incorporated the patch in both Lib/HTMLParser.py 1.12 and Lib/sgmllib.py 1.42 (so it also fixes htmllib). Regression tests have been added to Lib/test/test_htmlparser.py 1.10 and Lib/test/test_sgmllib.py 1.5. ---------------------------------------------------------------------- Comment By: j paulson (fantoozler) Date: 2003-01-24 19:18 Message: Logged In: YES user_id=690612 Added test case to Lib/test/test_htmlparser.py in addition to the HTMLParser.py patch ---------------------------------------------------------------------- Comment By: j paulson (fantoozler) Date: 2003-01-17 13:46 Message: Logged In: YES user_id=690612 I'll attach the patch file again. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-01-17 09:17 Message: Logged In: YES user_id=33168 Was this supposed to be a patch or a bug report? There is no patch attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=669683&group_id=5470 From noreply@sourceforge.net Fri Mar 14 16:36:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 08:36:47 -0800 Subject: [Patches] [ python-Patches-674448 ] test_htmlparser.py -- "," in attributes Message-ID: Patches item #674448, was opened at 2003-01-24 23:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=674448&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: j paulson (fantoozler) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: test_htmlparser.py -- "," in attributes Initial Comment: Added a test verifying patch #669683 works. ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2003-03-14 11:36 Message: Logged In: YES user_id=3066 Ok, I've already added a nearly identical test; I didn't see this patch while I was looking at 669683. Closing as accepted, since my patch was almost identical. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=674448&group_id=5470 From noreply@sourceforge.net Fri Mar 14 16:48:12 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 08:48:12 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 17:05:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 09:05:59 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 11:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 17:07:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 09:07:41 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) >Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 17:14:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 09:14:34 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 11:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 17:23:26 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 09:23:26 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 18:05:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 10:05:59 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 18:17:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 10:17:13 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 18:19:45 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 10:19:45 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) >Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-14 13:19 Message: Logged In: YES user_id=31435 I'd like it fine if 1. It did rv = PyString_FromString("") and then fell thru to the existing "return rv;", instead of creating another return point. 2. Add a comment about why this convolution is needed: this part of the function has been implicating in two bugs so far. The base64 stuff silently skips over garbage characters because everything would break now if it didn't . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 11:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 19:25:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 11:25:15 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-14 14:25 Message: Logged In: YES user_id=12800 why wouldn't calling it on garbage data raise binascii.Error? i think i'd feel more comfortable about the patch if it did that instead (to be consistent with incomplete padding errors and such). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 13:19 Message: Logged In: YES user_id=31435 I'd like it fine if 1. It did rv = PyString_FromString("") and then fell thru to the existing "return rv;", instead of creating another return point. 2. Add a comment about why this convolution is needed: this part of the function has been implicating in two bugs so far. The base64 stuff silently skips over garbage characters because everything would break now if it didn't . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 11:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Fri Mar 14 22:03:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 14 Mar 2003 14:03:58 -0800 Subject: [Patches] [ python-Patches-675422 ] Add tzset method to time module Message-ID: Patches item #675422, was opened at 2003-01-27 08:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Guido van Rossum (gvanrossum) Summary: Add tzset method to time module Initial Comment: Adds access to the tzset method, allowing you to change your local timezone as required. In addition to invoking the tzset system call, the code also updates the timezone attributes (time.timezone etc). This lets you do timezone conversions amongst other things. Also includes changes to configure.in to only build new code if the tzset method correctly switches timezones on your platform. This should be for all modern Unixes, and possibly other platforms. Also includes tests in test_time.py Docs would be along the lines of: tzset() -- Initialize, or reinitialize, the local timezone to the value stored in os.environ['TZ']. The TZ environment variable should be specified in standard Uniz timezone format as documented in the tzset man page (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently fall back to UTC. If the TZ environment variable is not set, the local timezone is set to the systems best guess of wallclock time. Changing the TZ environment variable without calling tzset *may* change the local timezone used by methods such as localtime, but this behaviour should not be relied on. eg:: >>> now = time.time() >>> os.environ['TZ'] = 'Europe/Amsterdam' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 14:35:17 2003' >>> time.tzname ('CET', 'CEST') >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 08:35:17 2003' >>> time.tzname ('EST', 'EDT') ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-14 17:03 Message: Logged In: YES user_id=6380 OK, checked in with that line removed. Thanks! ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-03-07 23:42 Message: Logged In: YES user_id=46639 Leave it commented out or remove that line. It is testing unimportant behaviour that looks more platform dependant than I suspected (and now I look at it again, what tzname should be set to if the timezone is unknow is unspecified by the tzset(3) docs). The important behaviour is that: a) the system silently falls back to UTC if the timezone is unknown, and this is tested elsewhere b) calling tzset resets tzname, which is also tested elsewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 09:25 Message: Logged In: YES user_id=6380 zenzen: when I run the test suite on my Red Hat Linux 7.3 box, I get one failure: the test line self.failUnless(time.tzname[0] in ('UTC','GMT')) fails when the timezone is set to 'Luna/Tycho', because tzname is in fact set to ('Luna/Tych', 'Luna/Tych'). If I comment out that one line the tzset test suite passes. What should I do? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 16:49 Message: Logged In: YES user_id=6380 Sorry, not a chance. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-21 16:45 Message: Logged In: YES user_id=46639 It is a patch to 2.3, but I'd though I'd try and sneak this new feature past people into 2.2.3 as I want to be able to use it in Zope 2 :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 07:56 Message: Logged In: YES user_id=6380 Uh? This is a new feature, so doesn't apply to 2.2.3. Maybe you meant 2.3? ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-20 23:29 Message: Logged In: YES user_id=46639 Assigning to Guido for consideration of being added to 2.2.3, and since he through this patch was a good idea in the first place :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 From noreply@sourceforge.net Sat Mar 15 13:51:44 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 15 Mar 2003 05:51:44 -0800 Subject: [Patches] [ python-Patches-701743 ] Reloading pseudo modules Message-ID: Patches item #701743, was opened at 2003-03-11 19:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Reloading pseudo modules Initial Comment: Python allows to put something that is not a module in sys.modules. Unfortunately reload() does not work wth such a pseudo module ("TypeError: reload() argument must be module" is raised). This patch changes Python/import.c::PyImport_ReloadModule() so that it works with anything that has a __name__ attribute that can be found in sys.modules.keys(). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-15 14:51 Message: Logged In: YES user_id=21627 I think the exceptions need to be reworked: "must be a module" now only occurs if m is NULL. Under what circumstances could that happen? Failure to provide __name__ is passed through; shouldn't this get diagnosed in a better way? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 From noreply@sourceforge.net Sun Mar 16 22:02:02 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 16 Mar 2003 14:02:02 -0800 Subject: [Patches] [ python-Patches-704676 ] add direct access to MD5 compression function to md5 module Message-ID: Patches item #704676, was opened at 2003-03-17 00:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=704676&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Reuben Sumner (rasumner) Assigned to: Nobody/Anonymous (nobody) Summary: add direct access to MD5 compression function to md5 module Initial Comment: Access to the MD5 compression function allows doing things like calculating NMAC (see http://www.cs.ucsd.edu/users/mihir/papers/hmac.html). This patch gives such access. If accepted I am happy to do the same for SHA-1. I didn't update the doc strings very well, and I didn't update the documentation source at all. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=704676&group_id=5470 From noreply@sourceforge.net Sun Mar 16 22:24:45 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 16 Mar 2003 14:24:45 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-16 23:24 Message: Logged In: YES user_id=34209 Well, the patch restores the behaviour of Python 2.1 and earlier (at least as far back as 1.5.2.) Also, binascii ignores any errors *except* padding errors, and 'padding errors' mean 'valid base64-characters left over'. Invalid characters inside the base64 stream are silently ignored, and in fact the base64 test explicitly tests this behaviour... I think ignoring anything but whitespace in the first place is the problem here, but that's not the problem the patch tries to solve, and I don't know enough about the official specs to say whether this is mandatory or not. Added the changes Tim wanted to the patch. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-14 20:25 Message: Logged In: YES user_id=12800 why wouldn't calling it on garbage data raise binascii.Error? i think i'd feel more comfortable about the patch if it did that instead (to be consistent with incomplete padding errors and such). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 19:19 Message: Logged In: YES user_id=31435 I'd like it fine if 1. It did rv = PyString_FromString("") and then fell thru to the existing "return rv;", instead of creating another return point. 2. Add a comment about why this convolution is needed: this part of the function has been implicating in two bugs so far. The base64 stuff silently skips over garbage characters because everything would break now if it didn't . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Sun Mar 16 22:42:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 16 Mar 2003 14:42:28 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-13 01:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-16 23:42 Message: Logged In: YES user_id=45365 Donovan, in as far as I understand the matter (in which area you are clearly my superior:-) I think the idea of the fix is correct, but I have one misgiving: if a class has no properties then v._propdict will still be empty after getbaseclasses(). This will result in the next call of getbaseclasses (if this class is the base class of another) going through the motions again. Is this a problem? Also, do we really need _superclassnames, can't we do this with __bases__? I vaguely remember we went through this issue before, but I can't remember fully... ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-13 01:08 Message: Logged In: YES user_id=111050 Whoops. Have to click the checkbox. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-13 01:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Sun Mar 16 22:49:35 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 16 Mar 2003 14:49:35 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-16 17:49 Message: Logged In: YES user_id=31435 The bug here is exposing recycled memory, so let's fix that first. Changing under which conditions the function raises exceptions is a different can of worms, and is probably off the table for 2.2 backporting regardless. Barry, if you're happy with the way the patch fixes the reported bug, please accept it and assign it back to Thomas; if you want to go on to change error-raising behavior for 2.3, better to open another report. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-16 17:24 Message: Logged In: YES user_id=34209 Well, the patch restores the behaviour of Python 2.1 and earlier (at least as far back as 1.5.2.) Also, binascii ignores any errors *except* padding errors, and 'padding errors' mean 'valid base64-characters left over'. Invalid characters inside the base64 stream are silently ignored, and in fact the base64 test explicitly tests this behaviour... I think ignoring anything but whitespace in the first place is the problem here, but that's not the problem the patch tries to solve, and I don't know enough about the official specs to say whether this is mandatory or not. Added the changes Tim wanted to the patch. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-14 14:25 Message: Logged In: YES user_id=12800 why wouldn't calling it on garbage data raise binascii.Error? i think i'd feel more comfortable about the patch if it did that instead (to be consistent with incomplete padding errors and such). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 13:19 Message: Logged In: YES user_id=31435 I'd like it fine if 1. It did rv = PyString_FromString("") and then fell thru to the existing "return rv;", instead of creating another return point. 2. Add a comment about why this convolution is needed: this part of the function has been implicating in two bugs so far. The base64 stuff silently skips over garbage characters because everything would break now if it didn't . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 11:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Mon Mar 17 05:26:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 16 Mar 2003 21:26:18 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 03:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 8 Submitted By: Hye-Shik Chang (perky) >Assigned to: Tim Peters (tim_one) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-17 00:26 Message: Logged In: YES user_id=12800 I'm happy with the patch for the reported error. I guess the inconsistency in behavior is just the price to pay for the age of the api. Someday we'll design a better interface. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-16 17:49 Message: Logged In: YES user_id=31435 The bug here is exposing recycled memory, so let's fix that first. Changing under which conditions the function raises exceptions is a different can of worms, and is probably off the table for 2.2 backporting regardless. Barry, if you're happy with the way the patch fixes the reported bug, please accept it and assign it back to Thomas; if you want to go on to change error-raising behavior for 2.3, better to open another report. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-16 17:24 Message: Logged In: YES user_id=34209 Well, the patch restores the behaviour of Python 2.1 and earlier (at least as far back as 1.5.2.) Also, binascii ignores any errors *except* padding errors, and 'padding errors' mean 'valid base64-characters left over'. Invalid characters inside the base64 stream are silently ignored, and in fact the base64 test explicitly tests this behaviour... I think ignoring anything but whitespace in the first place is the problem here, but that's not the problem the patch tries to solve, and I don't know enough about the official specs to say whether this is mandatory or not. Added the changes Tim wanted to the patch. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-14 14:25 Message: Logged In: YES user_id=12800 why wouldn't calling it on garbage data raise binascii.Error? i think i'd feel more comfortable about the patch if it did that instead (to be consistent with incomplete padding errors and such). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 13:19 Message: Logged In: YES user_id=31435 I'd like it fine if 1. It did rv = PyString_FromString("") and then fell thru to the existing "return rv;", instead of creating another return point. 2. Add a comment about why this convolution is needed: this part of the function has been implicating in two bugs so far. The base64 stuff silently skips over garbage characters because everything would break now if it didn't . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 13:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 12:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 12:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 11:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 10:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Mon Mar 17 11:48:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 17 Mar 2003 03:48:11 -0800 Subject: [Patches] [ python-Patches-703471 ] (Security Problem) base64.decodestring exposes garbage value Message-ID: Patches item #703471, was opened at 2003-03-14 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 8 Submitted By: Hye-Shik Chang (perky) >Assigned to: Thomas Wouters (twouters) Summary: (Security Problem) base64.decodestring exposes garbage value Initial Comment: >>> import base64 >>> base64.decodestring("###################") '\x0cD\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".....") 'ps2\x00\x00t' >>> base64.decodestring("........................") '\x0cF\x1a\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> base64.decodestring(".................................................") '.............................."\x00\x00\x00\x00\x00\x00\x00\x00' This exposes unexpected values that deallocated recently. (some my cgi script showed garbage that contains a database password in offensive query) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-17 12:48 Message: Logged In: YES user_id=34209 Checked into HEAD and 2.2-maint as: Modules/binascii.c: 2.39 and 2.33.4.4 Lib/test/test_binascii.py: 1.16 and 1.11.10.1 Also added Hye-Shik Chang to ACKS. Thanks :-) ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-17 06:26 Message: Logged In: YES user_id=12800 I'm happy with the patch for the reported error. I guess the inconsistency in behavior is just the price to pay for the age of the api. Someday we'll design a better interface. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-16 23:49 Message: Logged In: YES user_id=31435 The bug here is exposing recycled memory, so let's fix that first. Changing under which conditions the function raises exceptions is a different can of worms, and is probably off the table for 2.2 backporting regardless. Barry, if you're happy with the way the patch fixes the reported bug, please accept it and assign it back to Thomas; if you want to go on to change error-raising behavior for 2.3, better to open another report. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-16 23:24 Message: Logged In: YES user_id=34209 Well, the patch restores the behaviour of Python 2.1 and earlier (at least as far back as 1.5.2.) Also, binascii ignores any errors *except* padding errors, and 'padding errors' mean 'valid base64-characters left over'. Invalid characters inside the base64 stream are silently ignored, and in fact the base64 test explicitly tests this behaviour... I think ignoring anything but whitespace in the first place is the problem here, but that's not the problem the patch tries to solve, and I don't know enough about the official specs to say whether this is mandatory or not. Added the changes Tim wanted to the patch. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-03-14 20:25 Message: Logged In: YES user_id=12800 why wouldn't calling it on garbage data raise binascii.Error? i think i'd feel more comfortable about the patch if it did that instead (to be consistent with incomplete padding errors and such). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 19:19 Message: Logged In: YES user_id=31435 I'd like it fine if 1. It did rv = PyString_FromString("") and then fell thru to the existing "return rv;", instead of creating another return point. 2. Add a comment about why this convolution is needed: this part of the function has been implicating in two bugs so far. The base64 stuff silently skips over garbage characters because everything would break now if it didn't . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:17 Message: Logged In: YES user_id=34209 Assigning to Barry for review, on the 'last urinated' principle . ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 19:05 Message: Logged In: YES user_id=34209 Version of the patch with a test attached. It looks sane to me, and it seems to work. I'm not sure why binascii isn't raising an exception when receiving invalid data, but this is python2.1-and-earlier behaviour, and I'm not about to break that. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:23 Message: Logged In: YES user_id=34209 Hm, I see. I figured it was PyString_FromStringAndSize()'s fault for not honoring the NULL source-string in the case of a zero-length request, but I see how that might be intended. How about this patch instead ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:14 Message: Logged In: YES user_id=31435 The thing I'm worried about is that _PyString_Resize must not be called on a string that's empty to begin with (resizing will fail because the empty string is shared, and the resize routine checks for that). The *last* patch to this function inserted the bin_len > 0 test for what appears to be that very reason -- but that also created the problem we're seeing now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 18:07 Message: Logged In: YES user_id=34209 Ah, it is not. I'll see about fixing it (and writing the testcase etc etc yahdah yahdah.) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 18:05 Message: Logged In: YES user_id=31435 I raised the priority so someone would look at it. That part worked . I'm unsure about the patch, but don't have time to explain that now. ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2003-03-14 17:48 Message: Logged In: YES user_id=34209 The patch seems to me to be the correct fix. Did you have a reason to raise the priority but not check it in, Tim ? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-14 16:19 Message: Logged In: YES user_id=31435 Yikes! Boosted priority way up. A quick check shows that my Python 2.2.2 also appears to "decode" free'd RAM here on Windows. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=703471&group_id=5470 From noreply@sourceforge.net Mon Mar 17 14:25:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 17 Mar 2003 06:25:14 -0800 Subject: [Patches] [ python-Patches-701743 ] Reloading pseudo modules Message-ID: Patches item #701743, was opened at 2003-03-11 19:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Reloading pseudo modules Initial Comment: Python allows to put something that is not a module in sys.modules. Unfortunately reload() does not work wth such a pseudo module ("TypeError: reload() argument must be module" is raised). This patch changes Python/import.c::PyImport_ReloadModule() so that it works with anything that has a __name__ attribute that can be found in sys.modules.keys(). ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-17 15:25 Message: Logged In: YES user_id=89016 PyImport_ReloadModule() is only called by the implementation of the reload builtin, so it seems that m==NULL can only happen with broken extension modules. I've updated the patch accordingly (raising a SystemError) and changed the error case for a missing __name__ attribute to raise a TypeError when an AttributeError is detected. Unfortunately this might mask exceptions (e.g. when __name__ is implemented as a property.) Another problem is that reload() seems to repopulate the existing module object when reloading real modules. Example: Write a simple foo.py which contains "x = 1" and then: >>> import foo >>> foo.x 1 [ Now open your editor and change foo.py to "x = 2" ] >>> foo2 = reload(foo) >>> foo.x 2 >>> foo2.x 2 >>> print id(foo), id(foo2) 1077466884 1077466884 >>> Of course this can't work with pseudo modules. I wonder why reload() has a return value at all, as it always modifies its parameter for real modules. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-15 14:51 Message: Logged In: YES user_id=21627 I think the exceptions need to be reworked: "must be a module" now only occurs if m is NULL. Under what circumstances could that happen? Failure to provide __name__ is passed through; shouldn't this get diagnosed in a better way? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 From noreply@sourceforge.net Tue Mar 18 00:55:25 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 17 Mar 2003 16:55:25 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 14:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-17 19:55 Message: Logged In: YES user_id=80475 I'm going to unassign this one because the patch makes me uncomfortable. The tp_iter slot was already filled in a way that is reasonable and the new code doesn't seem to be an improvement. If you go ahead with it, carefully consider whether some negative effects can arise from eliminating the get/setattrs. Also, the call to readline should avoid creating a new empty tuple on each call (either make a single one and re-use it everytime or alter readline to accept a NULL for args). The 0,0,0,0,0,0,0 style in the type definition should be spelled-out line by line so that it is maintainable and is consistent with other modules. All that being said, the test cases were nice and code runs flawlessly. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-11 21:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 17:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Tue Mar 18 14:26:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 18 Mar 2003 06:26:17 -0800 Subject: [Patches] [ python-Patches-696392 ] allow proxy server authentication with pimp Message-ID: Patches item #696392, was opened at 2003-03-03 08:38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696392&group_id=5470 Category: Macintosh Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Andrew Straw (astraw) Assigned to: Jack Jansen (jackjansen) Summary: allow proxy server authentication with pimp Initial Comment: The urllib module does not support http proxy authentication with passwords. The urllib2 module does, so I changed pimp.py to use urllib2. I have tested the patch below after setting my http_proxy environment variable to the form "http://user:pass@proxy.com:1234". It may be possible to remove the dependency on urllib entirely by sustituting a urllib2 work-alike for a call to urllib.url2pathname(). This may affect the exception(s) raised when unable to connect. For example, PackageManager.py catches an IOError, but I believe urllib2 raises a socket.gaierror when unable to resolve the name of the URL. I have not resolved this issue. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-18 15:26 Message: Logged In: YES user_id=45365 Checked into CVS. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=696392&group_id=5470 From noreply@sourceforge.net Tue Mar 18 14:38:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 18 Mar 2003 06:38:18 -0800 Subject: [Patches] [ python-Patches-578667 ] Put IDE scripts in ~/Library Message-ID: Patches item #578667, was opened at 2002-07-08 15:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 Category: Macintosh Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Just van Rossum (jvr) Summary: Put IDE scripts in ~/Library Initial Comment: Just, here's a patch that was part of a larger set and this one was unrelated to the rest(unfortunately I've forgotten who sent it). The patch moves the IDE scripts folder to ~/Library when running on OSX. This is a good idea, because it allows people to have their own private set of IDE scripts, even if a sysadmin has installed Python. But: the patch as-is is probably not good enough, as there is no place for system-wide scripts anymore. (Scripts will also be shared between MacPython IDE and MachoPython IDE, which is also nice) You may want to look at providing two scripts folders, one in the normal location (i.e. somewhere in the Python tree) and one in ~/Library. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-18 15:38 Message: Logged In: YES user_id=45365 Just, shouldn't this be closed? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2002-07-08 16:47 Message: Logged In: YES user_id=92689 It was Tony Lownds. I'm all for the intentions of the patch, but I see it will fail on MacPython, which doesn't support os.environ["HOME"]. But I guess that statement could simply be replaced by the appropriate FindFolder() call. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 From noreply@sourceforge.net Tue Mar 18 18:00:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 18 Mar 2003 10:00:55 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-12 15:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Donovan Preston (dsposx) Date: 2003-03-18 09:00 Message: Logged In: YES user_id=111050 Jack, Thanks for taking a look at this. You are correct, if a class has no properties then v._propdict will still be empty, and we will do unneccessary work the next time getbaseclasses is called. I suppose it could be "if not v._propdict and not v._elemdict:" which would reduce the unnecessary work down to when a base class has neither properties nor elements; frankly the if is not really required at all; it was just an attempt to prevent work that has already been performed from being performed again unnecessarily. Suggestions welcome. Re _superclassnames, like everything else done with gensuitemodule, we need to be really careful about circular references, references to things that haven't been defined yet, etc. Everything generated by gensuitemodule is either a ComponentItem or an NProperty, and they don't actually inherit from each other in Python because doing so would be too hairy. So we can't use __bases__ because there is none :-) The thing about _superclassnames is that it's just what it sounds like; a list of strings that indicate superclasses of the current class. By deferring getbaseclasses to import time, we ensure all of the base classes are defined by then. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-16 13:42 Message: Logged In: YES user_id=45365 Donovan, in as far as I understand the matter (in which area you are clearly my superior:-) I think the idea of the fix is correct, but I have one misgiving: if a class has no properties then v._propdict will still be empty after getbaseclasses(). This will result in the next call of getbaseclasses (if this class is the base class of another) going through the motions again. Is this a problem? Also, do we really need _superclassnames, can't we do this with __bases__? I vaguely remember we went through this issue before, but I can't remember fully... ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Whoops. Have to click the checkbox. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Tue Mar 18 18:14:12 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 18 Mar 2003 10:14:12 -0800 Subject: [Patches] [ python-Patches-578667 ] Put IDE scripts in ~/Library Message-ID: Patches item #578667, was opened at 2002-07-08 15:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 Category: Macintosh Group: None >Status: Closed >Resolution: Wont Fix Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Just van Rossum (jvr) Summary: Put IDE scripts in ~/Library Initial Comment: Just, here's a patch that was part of a larger set and this one was unrelated to the rest(unfortunately I've forgotten who sent it). The patch moves the IDE scripts folder to ~/Library when running on OSX. This is a good idea, because it allows people to have their own private set of IDE scripts, even if a sysadmin has installed Python. But: the patch as-is is probably not good enough, as there is no place for system-wide scripts anymore. (Scripts will also be shared between MacPython IDE and MachoPython IDE, which is also nice) You may want to look at providing two scripts folders, one in the normal location (i.e. somewhere in the Python tree) and one in ~/Library. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-18 19:14 Message: Logged In: YES user_id=92689 I guess -- it's not realistic that I'll look into this anytime soon. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-18 15:38 Message: Logged In: YES user_id=45365 Just, shouldn't this be closed? ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2002-07-08 16:47 Message: Logged In: YES user_id=92689 It was Tony Lownds. I'm all for the intentions of the patch, but I see it will fail on MacPython, which doesn't support os.environ["HOME"]. But I guess that statement could simply be replaced by the appropriate FindFolder() call. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 From noreply@sourceforge.net Tue Mar 18 18:22:38 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 18 Mar 2003 10:22:38 -0800 Subject: [Patches] [ python-Patches-681927 ] bundlebuilder: Add dylibs, frameworks to the bundle Message-ID: Patches item #681927, was opened at 2003-02-06 22:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681927&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Robin Dunn (robind) Assigned to: Just van Rossum (jvr) Summary: bundlebuilder: Add dylibs, frameworks to the bundle Initial Comment: This patch adds the ability to specify that shared libraries and Frameworks (the last is untested as of yet) to the bundle. It is mostly by Kevin Olliver with some suggestions by me. In addition to copying the files into the bundle the launcher script in the bundle is modified to set the DYLD_LIBRARY_PATH to the right place. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-18 19:22 Message: Logged In: YES user_id=92689 Having a manual option is a fine start. But can any of you rework the patch so it doesn't mess with whitespace, and update it for current CVS? ---------------------------------------------------------------------- Comment By: Kevin Ollivier (kollivier) Date: 2003-02-07 21:52 Message: Logged In: YES user_id=248468 I'll take a look at otool and see if it does what we need. As Robin mentioned, I think giving both the manual and auto options is the best approach. I'll also check into the dependency on Apple's Dev Tools, but even if it is dependent we could just switch off auto-detection if users don't have it and spit out a warning. Another possible way to alleviate this problem may be to integrate with distutils. (i.e. make a 'buildbundle' option) That should at least allow us to find and include any libraries the developer linked against. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-07 09:59 Message: Logged In: YES user_id=92689 I use tabs for indentation and use spaces for alignment... So things look nice _and_ wont screw up with different tab settings. But I admit that using such a non-standard way is asking for trouble. I'll convert to spaces after this patch has been done (unless you prefer I do it _before_ ;-). (Btw. it might be that otool is only available with the apple dev tools, which would be a shame since we otherwise don't depend in dev tools being available. Hm.) ---------------------------------------------------------------------- Comment By: Robin Dunn (robind) Date: 2003-02-07 00:20 Message: Logged In: YES user_id=53955 Oops, sorry for the witespace patches. I noticed that my lines used spaces but the lines around them were using tabs so I just ran a tabify on the whole file without taking another look at the resulting patch file after that. Looks like some of other lines that wre added since 2.3a1 have spaces too and that is where the problem comes from. I'll redo the patch but the whole file should probably be either tabified or untabified after you are done applying it. I didn't know about otool. I'll pass that on to Kevin. We discussed about doing automatic finding of libs but didn't know how to go about it so thought that this would be a good start. Also we figured that even if there was a way to do it that you would probably want a way to inlcude other files that may not get automatically found, or to exclude some that were, so there should be command line options for it anyway. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-06 23:35 Message: Logged In: YES user_id=92689 Cool. There's a problem with the patch, though: although I apologize for using tabs to begin with, please keep the tab usage consistent. There are quite a few hunks in the patch that only touch whitespace and that's both undesirable as well as blurring the intent of the patch... Could you upload a cleaner one? Btw. for the --standalone build mode it would be possible to calculate all framework/dylib dependencies with the otool tool. If this were implemented perhaps the --lib option wouldn't even be needed? Another question remains: if we include a framework, is there a way to strip it from redunant files, eg. headers? If we would use this mechanism to include Python.framework we would definitely need a way to trim it down, eg. all of lib is taken care of by modulefinder anyway. If you (or Kevin) have any ideas about that, pls contact me off line. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681927&group_id=5470 From noreply@sourceforge.net Wed Mar 19 15:55:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 07:55:41 -0800 Subject: [Patches] [ python-Patches-706338 ] Fix a few broken links in pydoc Message-ID: Patches item #706338, was opened at 2003-03-19 06:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706338&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix a few broken links in pydoc Initial Comment: Patch to fix a few of the help files references in pydoc.Helper. I'm not sure what was originally in 'ref/execframe' (which does not exist in the 2.3 documentation set), but, since 'ref/naming' seems the best file for NAMESPACES, I converted both references to 'ref/execframe' to 'ref/naming'. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706338&group_id=5470 From noreply@sourceforge.net Wed Mar 19 17:46:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 09:46:15 -0800 Subject: [Patches] [ python-Patches-706406 ] fix bug #685846: raw_input defers signals Message-ID: Patches item #706406, was opened at 2003-03-19 17:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706406&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug #685846: raw_input defers signals Initial Comment: This patch attempts to fix raw_input so it can be interrupted by signals. In the process it allows SIGINT handling to be honored by raw_input. (right now SIGINT always interrupts regardless of any installed handlers) Effects: Signals are handled with their installed handlers and when those handlers raise exceptions those exceptions are raised by raw_input. If an exception is not raised, raw_input continues collecting input as if nothing had happened. This can be problematic if the signal causes output to appear on the screen, messing up the input line, or if someone using the readline module was in the middle of a complex operation, like a reverse search, in which case that operation will be cancelled. It would be easy to instead print a message ("Signal Interruption") and continue input on a new line for the readline library, but this couldn't happen in myreadline.c as we can't retrieve the partially entered input. Backwards compatibility: This patch requires the readline handler (either call_readline or PyOS_StdioReadline generally) to be called while holding the global interpreter lock. It is then responsible for releasing the GIL before doing blocking input. This will cause problems for anyone who has written an extension that installs a custom readline handler. In python code, anyone using signals and expecting raw_input not to be interrupted by them will have problems (but this seems unlikely). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706406&group_id=5470 From noreply@sourceforge.net Wed Mar 19 17:47:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 09:47:17 -0800 Subject: [Patches] [ python-Patches-706406 ] fix bug #685846: raw_input defers signals Message-ID: Patches item #706406, was opened at 2003-03-19 17:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706406&group_id=5470 >Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug #685846: raw_input defers signals Initial Comment: This patch attempts to fix raw_input so it can be interrupted by signals. In the process it allows SIGINT handling to be honored by raw_input. (right now SIGINT always interrupts regardless of any installed handlers) Effects: Signals are handled with their installed handlers and when those handlers raise exceptions those exceptions are raised by raw_input. If an exception is not raised, raw_input continues collecting input as if nothing had happened. This can be problematic if the signal causes output to appear on the screen, messing up the input line, or if someone using the readline module was in the middle of a complex operation, like a reverse search, in which case that operation will be cancelled. It would be easy to instead print a message ("Signal Interruption") and continue input on a new line for the readline library, but this couldn't happen in myreadline.c as we can't retrieve the partially entered input. Backwards compatibility: This patch requires the readline handler (either call_readline or PyOS_StdioReadline generally) to be called while holding the global interpreter lock. It is then responsible for releasing the GIL before doing blocking input. This will cause problems for anyone who has written an extension that installs a custom readline handler. In python code, anyone using signals and expecting raw_input not to be interrupted by them will have problems (but this seems unlikely). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706406&group_id=5470 From noreply@sourceforge.net Wed Mar 19 18:39:01 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 10:39:01 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 19:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-19 18:39 Message: Logged In: YES user_id=670441 I'm not sure I understand your concern with the new tp_iter slot, it just makes cStringIO a self iterator as requested on python-dev, going for the analogy with file objects, right? Actually it should probably use the still-being-debated GenericGetIter or whatever it will be called, but not until the debate is over. I think the get/setattrs are okay. Everything they did is done by the default get/set attrs, once we set up the appropriate methods and members (there's just the one member, softspace). I thought replacing them by the defaults would be clearer and easier to maintain. Also, it is in analogy with fileobject.c, so I thought making the cStringIO implementation more like file's would be good. As for the creating a new tuple every time and the 0,0,0,0 style, you're absolutely right, I've attached a new patch that fixes those up per your suggestions. I was creating a new tuple every time in analogy with iterobject.c's calliter_iternext. Perhaps that should be changed as well? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-18 00:55 Message: Logged In: YES user_id=80475 I'm going to unassign this one because the patch makes me uncomfortable. The tp_iter slot was already filled in a way that is reasonable and the new code doesn't seem to be an improvement. If you go ahead with it, carefully consider whether some negative effects can arise from eliminating the get/setattrs. Also, the call to readline should avoid creating a new empty tuple on each call (either make a single one and re-use it everytime or alter readline to accept a NULL for args). The 0,0,0,0,0,0,0 style in the type definition should be spelled-out line by line so that it is maintainable and is consistent with other modules. All that being said, the test cases were nice and code runs flawlessly. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-12 02:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-12 01:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 22:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Wed Mar 19 19:54:56 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 11:54:56 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 14:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-19 14:54 Message: Logged In: YES user_id=80475 It looks good to me, compiles okay, passes tests, etc. I do prefer that you get one more reviewer to look at it. Neal or MvL might be a good choice. GvR picked PyObject_SelfIter to be the name of the iterator's tp_iter slot filler. So you can go ahead and use it to eliminate IO_getiter. One nit, when you load the next patch, copy in the unchanged lines from the original. There are many lines marked as having a change but the content is the same. This means that something changed in the whitespace. It's not big deal but it makes the patch harder to review. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-19 13:39 Message: Logged In: YES user_id=670441 I'm not sure I understand your concern with the new tp_iter slot, it just makes cStringIO a self iterator as requested on python-dev, going for the analogy with file objects, right? Actually it should probably use the still-being-debated GenericGetIter or whatever it will be called, but not until the debate is over. I think the get/setattrs are okay. Everything they did is done by the default get/set attrs, once we set up the appropriate methods and members (there's just the one member, softspace). I thought replacing them by the defaults would be clearer and easier to maintain. Also, it is in analogy with fileobject.c, so I thought making the cStringIO implementation more like file's would be good. As for the creating a new tuple every time and the 0,0,0,0 style, you're absolutely right, I've attached a new patch that fixes those up per your suggestions. I was creating a new tuple every time in analogy with iterobject.c's calliter_iternext. Perhaps that should be changed as well? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-17 19:55 Message: Logged In: YES user_id=80475 I'm going to unassign this one because the patch makes me uncomfortable. The tp_iter slot was already filled in a way that is reasonable and the new code doesn't seem to be an improvement. If you go ahead with it, carefully consider whether some negative effects can arise from eliminating the get/setattrs. Also, the call to readline should avoid creating a new empty tuple on each call (either make a single one and re-use it everytime or alter readline to accept a NULL for args). The 0,0,0,0,0,0,0 style in the type definition should be spelled-out line by line so that it is maintainable and is consistent with other modules. All that being said, the test cases were nice and code runs flawlessly. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-11 21:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 17:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Wed Mar 19 21:17:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 13:17:18 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 19:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Michael Stone (mbrierst) Date: 2003-03-19 21:17 Message: Logged In: YES user_id=670441 Okay, patchstrio4 uses PyObject_SelfIter and doesn't have as much of my prettification, so there aren't any whitespace-only diff lines. (I think) Should I assign this patch to either Neal or MvL for further review, or would that be impolite? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-19 19:54 Message: Logged In: YES user_id=80475 It looks good to me, compiles okay, passes tests, etc. I do prefer that you get one more reviewer to look at it. Neal or MvL might be a good choice. GvR picked PyObject_SelfIter to be the name of the iterator's tp_iter slot filler. So you can go ahead and use it to eliminate IO_getiter. One nit, when you load the next patch, copy in the unchanged lines from the original. There are many lines marked as having a change but the content is the same. This means that something changed in the whitespace. It's not big deal but it makes the patch harder to review. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-19 18:39 Message: Logged In: YES user_id=670441 I'm not sure I understand your concern with the new tp_iter slot, it just makes cStringIO a self iterator as requested on python-dev, going for the analogy with file objects, right? Actually it should probably use the still-being-debated GenericGetIter or whatever it will be called, but not until the debate is over. I think the get/setattrs are okay. Everything they did is done by the default get/set attrs, once we set up the appropriate methods and members (there's just the one member, softspace). I thought replacing them by the defaults would be clearer and easier to maintain. Also, it is in analogy with fileobject.c, so I thought making the cStringIO implementation more like file's would be good. As for the creating a new tuple every time and the 0,0,0,0 style, you're absolutely right, I've attached a new patch that fixes those up per your suggestions. I was creating a new tuple every time in analogy with iterobject.c's calliter_iternext. Perhaps that should be changed as well? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-18 00:55 Message: Logged In: YES user_id=80475 I'm going to unassign this one because the patch makes me uncomfortable. The tp_iter slot was already filled in a way that is reasonable and the new code doesn't seem to be an improvement. If you go ahead with it, carefully consider whether some negative effects can arise from eliminating the get/setattrs. Also, the call to readline should avoid creating a new empty tuple on each call (either make a single one and re-use it everytime or alter readline to accept a NULL for args). The 0,0,0,0,0,0,0 style in the type definition should be spelled-out line by line so that it is maintainable and is consistent with other modules. All that being said, the test cases were nice and code runs flawlessly. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-12 02:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-12 01:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 22:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Wed Mar 19 21:27:24 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 13:27:24 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 14:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Neal Norwitz (nnorwitz) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-19 16:27 Message: Logged In: YES user_id=33168 I don't think it's impolite. I'll try to take a look later, unless someone beats me to it. :-) ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-19 16:17 Message: Logged In: YES user_id=670441 Okay, patchstrio4 uses PyObject_SelfIter and doesn't have as much of my prettification, so there aren't any whitespace-only diff lines. (I think) Should I assign this patch to either Neal or MvL for further review, or would that be impolite? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-19 14:54 Message: Logged In: YES user_id=80475 It looks good to me, compiles okay, passes tests, etc. I do prefer that you get one more reviewer to look at it. Neal or MvL might be a good choice. GvR picked PyObject_SelfIter to be the name of the iterator's tp_iter slot filler. So you can go ahead and use it to eliminate IO_getiter. One nit, when you load the next patch, copy in the unchanged lines from the original. There are many lines marked as having a change but the content is the same. This means that something changed in the whitespace. It's not big deal but it makes the patch harder to review. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-19 13:39 Message: Logged In: YES user_id=670441 I'm not sure I understand your concern with the new tp_iter slot, it just makes cStringIO a self iterator as requested on python-dev, going for the analogy with file objects, right? Actually it should probably use the still-being-debated GenericGetIter or whatever it will be called, but not until the debate is over. I think the get/setattrs are okay. Everything they did is done by the default get/set attrs, once we set up the appropriate methods and members (there's just the one member, softspace). I thought replacing them by the defaults would be clearer and easier to maintain. Also, it is in analogy with fileobject.c, so I thought making the cStringIO implementation more like file's would be good. As for the creating a new tuple every time and the 0,0,0,0 style, you're absolutely right, I've attached a new patch that fixes those up per your suggestions. I was creating a new tuple every time in analogy with iterobject.c's calliter_iternext. Perhaps that should be changed as well? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-17 19:55 Message: Logged In: YES user_id=80475 I'm going to unassign this one because the patch makes me uncomfortable. The tp_iter slot was already filled in a way that is reasonable and the new code doesn't seem to be an improvement. If you go ahead with it, carefully consider whether some negative effects can arise from eliminating the get/setattrs. Also, the call to readline should avoid creating a new empty tuple on each call (either make a single one and re-use it everytime or alter readline to accept a NULL for args). The 0,0,0,0,0,0,0 style in the type definition should be spelled-out line by line so that it is maintainable and is consistent with other modules. All that being said, the test cases were nice and code runs flawlessly. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-11 21:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 17:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Wed Mar 19 22:55:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 14:55:21 -0800 Subject: [Patches] [ python-Patches-706590 ] Adds Mock Objet support to unittest.TestCase Message-ID: Patches item #706590, was opened at 2003-03-19 22:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Matthew Russell (mattruss) Assigned to: Nobody/Anonymous (nobody) Summary: Adds Mock Objet support to unittest.TestCase Initial Comment: Mock objects can greatly improve unittests (If used in the correct context), especially for code that relis upon resource hungry test (connections to databases, socket servers etc). The module/patch (to unittest) which I am submitting helps to introspect calls to code whilst maintaing transparency and funcionality with your code. I had previously written a similar module for my present employers, and myself and fellow XP partners agree that it has made the XP testing cycle consderably easier. Having googol-ed-out alternatives on the web, I have not found a solution that provides the same level of flexibility. (hope that doesn't sound arrogant) The tests for this module should highlight usage, but i will supply dummy code if this idea is accepted. If unfamiliar with XP/MockObject ideas, please see : http://www.xprogramming.com/xpmag/virtualMockObject s.htm#N78 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 From noreply@sourceforge.net Wed Mar 19 23:11:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 15:11:09 -0800 Subject: [Patches] [ python-Patches-706590 ] Adds Mock Object support to unittest.TestCase Message-ID: Patches item #706590, was opened at 2003-03-19 22:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Matthew Russell (mattruss) Assigned to: Nobody/Anonymous (nobody) >Summary: Adds Mock Object support to unittest.TestCase Initial Comment: Mock objects can greatly improve unittests (If used in the correct context), especially for code that relis upon resource hungry test (connections to databases, socket servers etc). The module/patch (to unittest) which I am submitting helps to introspect calls to code whilst maintaing transparency and funcionality with your code. I had previously written a similar module for my present employers, and myself and fellow XP partners agree that it has made the XP testing cycle consderably easier. Having googol-ed-out alternatives on the web, I have not found a solution that provides the same level of flexibility. (hope that doesn't sound arrogant) The tests for this module should highlight usage, but i will supply dummy code if this idea is accepted. If unfamiliar with XP/MockObject ideas, please see : http://www.xprogramming.com/xpmag/virtualMockObject s.htm#N78 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 From noreply@sourceforge.net Thu Mar 20 04:18:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 20:18:50 -0800 Subject: [Patches] [ python-Patches-675422 ] Add tzset method to time module Message-ID: Patches item #675422, was opened at 2003-01-27 08:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Open Resolution: Accepted Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Guido van Rossum (gvanrossum) Summary: Add tzset method to time module Initial Comment: Adds access to the tzset method, allowing you to change your local timezone as required. In addition to invoking the tzset system call, the code also updates the timezone attributes (time.timezone etc). This lets you do timezone conversions amongst other things. Also includes changes to configure.in to only build new code if the tzset method correctly switches timezones on your platform. This should be for all modern Unixes, and possibly other platforms. Also includes tests in test_time.py Docs would be along the lines of: tzset() -- Initialize, or reinitialize, the local timezone to the value stored in os.environ['TZ']. The TZ environment variable should be specified in standard Uniz timezone format as documented in the tzset man page (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently fall back to UTC. If the TZ environment variable is not set, the local timezone is set to the systems best guess of wallclock time. Changing the TZ environment variable without calling tzset *may* change the local timezone used by methods such as localtime, but this behaviour should not be relied on. eg:: >>> now = time.time() >>> os.environ['TZ'] = 'Europe/Amsterdam' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 14:35:17 2003' >>> time.tzname ('CET', 'CEST') >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 08:35:17 2003' >>> time.tzname ('EST', 'EDT') ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-19 23:18 Message: Logged In: YES user_id=33168 test_time is now failing on Solaris 8. altzone is -3600, but should be 0. Also, is there a reason to compare timezone to altzone, but then check that each is 0 (line 78)? Can you provide any suggestions for where to look for the problem? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-14 17:03 Message: Logged In: YES user_id=6380 OK, checked in with that line removed. Thanks! ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-03-07 23:42 Message: Logged In: YES user_id=46639 Leave it commented out or remove that line. It is testing unimportant behaviour that looks more platform dependant than I suspected (and now I look at it again, what tzname should be set to if the timezone is unknow is unspecified by the tzset(3) docs). The important behaviour is that: a) the system silently falls back to UTC if the timezone is unknown, and this is tested elsewhere b) calling tzset resets tzname, which is also tested elsewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 09:25 Message: Logged In: YES user_id=6380 zenzen: when I run the test suite on my Red Hat Linux 7.3 box, I get one failure: the test line self.failUnless(time.tzname[0] in ('UTC','GMT')) fails when the timezone is set to 'Luna/Tycho', because tzname is in fact set to ('Luna/Tych', 'Luna/Tych'). If I comment out that one line the tzset test suite passes. What should I do? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 16:49 Message: Logged In: YES user_id=6380 Sorry, not a chance. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-21 16:45 Message: Logged In: YES user_id=46639 It is a patch to 2.3, but I'd though I'd try and sneak this new feature past people into 2.2.3 as I want to be able to use it in Zope 2 :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 07:56 Message: Logged In: YES user_id=6380 Uh? This is a new feature, so doesn't apply to 2.2.3. Maybe you meant 2.3? ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-20 23:29 Message: Logged In: YES user_id=46639 Assigning to Guido for consideration of being added to 2.2.3, and since he through this patch was a good idea in the first place :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 From noreply@sourceforge.net Thu Mar 20 04:57:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 20:57:47 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-20 15:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Nobody/Anonymous (nobody) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Thu Mar 20 05:06:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 19 Mar 2003 21:06:54 -0800 Subject: [Patches] [ python-Patches-675422 ] Add tzset method to time module Message-ID: Patches item #675422, was opened at 2003-01-28 00:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: Accepted Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Guido van Rossum (gvanrossum) Summary: Add tzset method to time module Initial Comment: Adds access to the tzset method, allowing you to change your local timezone as required. In addition to invoking the tzset system call, the code also updates the timezone attributes (time.timezone etc). This lets you do timezone conversions amongst other things. Also includes changes to configure.in to only build new code if the tzset method correctly switches timezones on your platform. This should be for all modern Unixes, and possibly other platforms. Also includes tests in test_time.py Docs would be along the lines of: tzset() -- Initialize, or reinitialize, the local timezone to the value stored in os.environ['TZ']. The TZ environment variable should be specified in standard Uniz timezone format as documented in the tzset man page (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently fall back to UTC. If the TZ environment variable is not set, the local timezone is set to the systems best guess of wallclock time. Changing the TZ environment variable without calling tzset *may* change the local timezone used by methods such as localtime, but this behaviour should not be relied on. eg:: >>> now = time.time() >>> os.environ['TZ'] = 'Europe/Amsterdam' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 14:35:17 2003' >>> time.tzname ('CET', 'CEST') >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 08:35:17 2003' >>> time.tzname ('EST', 'EDT') ---------------------------------------------------------------------- >Comment By: Stuart Bishop (zenzen) Date: 2003-03-20 16:06 Message: Logged In: YES user_id=46639 An update to this patch is now available: http://www.python.org/sf/706707 ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-20 15:18 Message: Logged In: YES user_id=33168 test_time is now failing on Solaris 8. altzone is -3600, but should be 0. Also, is there a reason to compare timezone to altzone, but then check that each is 0 (line 78)? Can you provide any suggestions for where to look for the problem? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-15 09:03 Message: Logged In: YES user_id=6380 OK, checked in with that line removed. Thanks! ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-03-08 15:42 Message: Logged In: YES user_id=46639 Leave it commented out or remove that line. It is testing unimportant behaviour that looks more platform dependant than I suspected (and now I look at it again, what tzname should be set to if the timezone is unknow is unspecified by the tzset(3) docs). The important behaviour is that: a) the system silently falls back to UTC if the timezone is unknown, and this is tested elsewhere b) calling tzset resets tzname, which is also tested elsewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-08 01:25 Message: Logged In: YES user_id=6380 zenzen: when I run the test suite on my Red Hat Linux 7.3 box, I get one failure: the test line self.failUnless(time.tzname[0] in ('UTC','GMT')) fails when the timezone is set to 'Luna/Tycho', because tzname is in fact set to ('Luna/Tych', 'Luna/Tych'). If I comment out that one line the tzset test suite passes. What should I do? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-22 08:49 Message: Logged In: YES user_id=6380 Sorry, not a chance. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-22 08:45 Message: Logged In: YES user_id=46639 It is a patch to 2.3, but I'd though I'd try and sneak this new feature past people into 2.2.3 as I want to be able to use it in Zope 2 :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 23:56 Message: Logged In: YES user_id=6380 Uh? This is a new feature, so doesn't apply to 2.2.3. Maybe you meant 2.3? ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-21 15:29 Message: Logged In: YES user_id=46639 Assigning to Guido for consideration of being added to 2.2.3, and since he through this patch was a good idea in the first place :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 From noreply@sourceforge.net Thu Mar 20 15:26:53 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 07:26:53 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 13:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) >Assigned to: Raymond Hettinger (rhettinger) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-20 16:26 Message: Logged In: YES user_id=89016 I've gone over the patch and simplyfied it a bit (e.g. replacing f(*(1,2,3) + args) with f(1,2,3, *args)). I've also removed the patches for distutils, logging and bsddb (code at the start of bsddb/dbutils.py seems to indicate that it should be usable with versions prior to 2.3). Raymond, do you have time to recheck the patch? ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-03-12 09:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-12 02:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 19:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Thu Mar 20 21:50:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 13:50:13 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-19 23:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stuart Bishop (zenzen) >Assigned to: Guido van Rossum (gvanrossum) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-20 16:50 Message: Logged In: YES user_id=31435 Assigned to Guido, as I can't test it. Two notes: 1. Leaving commented-out code in config and the test suite doesn't appear to serve a purpose, although it will serve to confuse future readers ("why is this here? why is it commented out?"). 2. The Python style guide asks for a blank after commas in argument lists and tuples. We're not really in danger of stretching the screen here . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Thu Mar 20 22:13:45 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 14:13:45 -0800 Subject: [Patches] [ python-Patches-707167 ] fix bug #682813: dircache.listdir doesn't signal error Message-ID: Patches item #707167, was opened at 2003-03-20 22:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707167&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug #682813: dircache.listdir doesn't signal error Initial Comment: Attached small patch makes dircache.listdir raise OSError when encountered in os.stat or os.listdir. This certainly seems like the right thing to do to be consistent with os.listdir, though there may have been a reason not to raise the exception I don't know about, as it is obviously being purposefully caught right now. If there is a reason, someone let me know and I'll submit a patch to change dircache's documentation to reflect its behavior. The test case is also updated by the patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707167&group_id=5470 From noreply@sourceforge.net Thu Mar 20 22:14:38 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 14:14:38 -0800 Subject: [Patches] [ python-Patches-707167 ] fix bug #682813: dircache.listdir doesn't signal error Message-ID: Patches item #707167, was opened at 2003-03-20 22:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707167&group_id=5470 >Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug #682813: dircache.listdir doesn't signal error Initial Comment: Attached small patch makes dircache.listdir raise OSError when encountered in os.stat or os.listdir. This certainly seems like the right thing to do to be consistent with os.listdir, though there may have been a reason not to raise the exception I don't know about, as it is obviously being purposefully caught right now. If there is a reason, someone let me know and I'll submit a patch to change dircache's documentation to reflect its behavior. The test case is also updated by the patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707167&group_id=5470 From noreply@sourceforge.net Fri Mar 21 01:03:02 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 17:03:02 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 01:11:42 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 17:11:42 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-19 23:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None >Priority: 7 Submitted By: Stuart Bishop (zenzen) >Assigned to: Nobody/Anonymous (nobody) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-20 20:11 Message: Logged In: YES user_id=6380 Unassigning, as I won't hve time for this. But it is important - someone else should make sure this goes into 2.3b1! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-20 16:50 Message: Logged In: YES user_id=31435 Assigned to Guido, as I can't test it. Two notes: 1. Leaving commented-out code in config and the test suite doesn't appear to serve a purpose, although it will serve to confuse future readers ("why is this here? why is it commented out?"). 2. The Python style guide asks for a blank after commas in argument lists and tuples. We're not really in danger of stretching the screen here . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Fri Mar 21 01:18:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 17:18:13 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-19 23:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Stuart Bishop (zenzen) >Assigned to: Neal Norwitz (nnorwitz) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-20 20:18 Message: Logged In: YES user_id=33168 I'll try to get to this soon. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-20 20:11 Message: Logged In: YES user_id=6380 Unassigning, as I won't hve time for this. But it is important - someone else should make sure this goes into 2.3b1! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-20 16:50 Message: Logged In: YES user_id=31435 Assigned to Guido, as I can't test it. Two notes: 1. Leaving commented-out code in config and the test suite doesn't appear to serve a purpose, although it will serve to confuse future readers ("why is this here? why is it commented out?"). 2. The Python style guide asks for a blank after commas in argument lists and tuples. We're not really in danger of stretching the screen here . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Fri Mar 21 01:56:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 17:56:55 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 17:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 17:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 02:03:40 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 18:03:40 -0800 Subject: [Patches] [ python-Patches-707167 ] fix bug #682813: dircache.listdir doesn't signal error Message-ID: Patches item #707167, was opened at 2003-03-20 14:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707167&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix bug #682813: dircache.listdir doesn't signal error Initial Comment: Attached small patch makes dircache.listdir raise OSError when encountered in os.stat or os.listdir. This certainly seems like the right thing to do to be consistent with os.listdir, though there may have been a reason not to raise the exception I don't know about, as it is obviously being purposefully caught right now. If there is a reason, someone let me know and I'll submit a patch to change dircache's documentation to reflect its behavior. The test case is also updated by the patch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 18:03 Message: Logged In: YES user_id=357491 Patch looks good. Don't let the wording in the description mislead you, though. No exception is specifically raised; it just is not caught anymore. As for whether this patch should be applied or not I have no clue since I never use the module. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707167&group_id=5470 From noreply@sourceforge.net Fri Mar 21 02:11:25 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 18:11:25 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 02:47:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 18:47:28 -0800 Subject: [Patches] [ python-Patches-681927 ] bundlebuilder: Add dylibs, frameworks to the bundle Message-ID: Patches item #681927, was opened at 2003-02-06 13:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681927&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Robin Dunn (robind) Assigned to: Just van Rossum (jvr) Summary: bundlebuilder: Add dylibs, frameworks to the bundle Initial Comment: This patch adds the ability to specify that shared libraries and Frameworks (the last is untested as of yet) to the bundle. It is mostly by Kevin Olliver with some suggestions by me. In addition to copying the files into the bundle the launcher script in the bundle is modified to set the DYLD_LIBRARY_PATH to the right place. ---------------------------------------------------------------------- >Comment By: Robin Dunn (robind) Date: 2003-03-20 18:47 Message: Logged In: YES user_id=53955 New patch attached ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-18 10:22 Message: Logged In: YES user_id=92689 Having a manual option is a fine start. But can any of you rework the patch so it doesn't mess with whitespace, and update it for current CVS? ---------------------------------------------------------------------- Comment By: Kevin Ollivier (kollivier) Date: 2003-02-07 12:52 Message: Logged In: YES user_id=248468 I'll take a look at otool and see if it does what we need. As Robin mentioned, I think giving both the manual and auto options is the best approach. I'll also check into the dependency on Apple's Dev Tools, but even if it is dependent we could just switch off auto-detection if users don't have it and spit out a warning. Another possible way to alleviate this problem may be to integrate with distutils. (i.e. make a 'buildbundle' option) That should at least allow us to find and include any libraries the developer linked against. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-07 00:59 Message: Logged In: YES user_id=92689 I use tabs for indentation and use spaces for alignment... So things look nice _and_ wont screw up with different tab settings. But I admit that using such a non-standard way is asking for trouble. I'll convert to spaces after this patch has been done (unless you prefer I do it _before_ ;-). (Btw. it might be that otool is only available with the apple dev tools, which would be a shame since we otherwise don't depend in dev tools being available. Hm.) ---------------------------------------------------------------------- Comment By: Robin Dunn (robind) Date: 2003-02-06 15:20 Message: Logged In: YES user_id=53955 Oops, sorry for the witespace patches. I noticed that my lines used spaces but the lines around them were using tabs so I just ran a tabify on the whole file without taking another look at the resulting patch file after that. Looks like some of other lines that wre added since 2.3a1 have spaces too and that is where the problem comes from. I'll redo the patch but the whole file should probably be either tabified or untabified after you are done applying it. I didn't know about otool. I'll pass that on to Kevin. We discussed about doing automatic finding of libs but didn't know how to go about it so thought that this would be a good start. Also we figured that even if there was a way to do it that you would probably want a way to inlcude other files that may not get automatically found, or to exclude some that were, so there should be command line options for it anyway. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-06 14:35 Message: Logged In: YES user_id=92689 Cool. There's a problem with the patch, though: although I apologize for using tabs to begin with, please keep the tab usage consistent. There are quite a few hunks in the patch that only touch whitespace and that's both undesirable as well as blurring the intent of the patch... Could you upload a cleaner one? Btw. for the --standalone build mode it would be possible to calculate all framework/dylib dependencies with the otool tool. If this were implemented perhaps the --lib option wouldn't even be needed? Another question remains: if we include a framework, is there a way to strip it from redunant files, eg. headers? If we would use this mechanism to include Python.framework we would definitely need a way to trim it down, eg. all of lib is taken care of by modulefinder anyway. If you (or Kevin) have any ideas about that, pls contact me off line. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681927&group_id=5470 From noreply@sourceforge.net Fri Mar 21 02:49:26 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 18:49:26 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 04:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Raymond Hettinger (rhettinger) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 18:49 Message: Logged In: YES user_id=357491 I went through Walter's diff by hand and found two places where more clean-up could be done and two show-stoppers. In case I don't get my version of the patch up fast enough for people, the files that have spots that could use some more minor clean-up are Lib/lib-tk/Tix.py and Lib/lib-tk/Tkinter.py . The showstoppers are in Lib/lib-tk/tkCommonDialog.py (method call that didn't get *'ed) and Lib/test/test_builtin.py (test_builtin.py should not even be patched since the affected lines are in the tests for apply() itself). I will have my version up before the weekend. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-20 07:26 Message: Logged In: YES user_id=89016 I've gone over the patch and simplyfied it a bit (e.g. replacing f(*(1,2,3) + args) with f(1,2,3, *args)). I've also removed the patches for distutils, logging and bsddb (code at the start of bsddb/dbutils.py seems to indicate that it should be usable with versions prior to 2.3). Raymond, do you have time to recheck the patch? ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-03-12 00:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 17:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 10:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Fri Mar 21 05:14:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 21:14:13 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 07:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Raymond Hettinger (rhettinger) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 00:14 Message: Logged In: YES user_id=80475 Good job Brett :-) I'll wait for your next post before going through this one with a fine toothed comb. -- R ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 21:49 Message: Logged In: YES user_id=357491 I went through Walter's diff by hand and found two places where more clean-up could be done and two show-stoppers. In case I don't get my version of the patch up fast enough for people, the files that have spots that could use some more minor clean-up are Lib/lib-tk/Tix.py and Lib/lib-tk/Tkinter.py . The showstoppers are in Lib/lib-tk/tkCommonDialog.py (method call that didn't get *'ed) and Lib/test/test_builtin.py (test_builtin.py should not even be patched since the affected lines are in the tests for apply() itself). I will have my version up before the weekend. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-20 10:26 Message: Logged In: YES user_id=89016 I've gone over the patch and simplyfied it a bit (e.g. replacing f(*(1,2,3) + args) with f(1,2,3, *args)). I've also removed the patches for distutils, logging and bsddb (code at the start of bsddb/dbutils.py seems to indicate that it should be usable with versions prior to 2.3). Raymond, do you have time to recheck the patch? ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-03-12 03:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 13:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Fri Mar 21 07:42:23 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 23:42:23 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 04:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Raymond Hettinger (rhettinger) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 23:42 Message: Logged In: YES user_id=357491 Well, I have now run into my first issue of not having commit priveleges; I can't upload my diff. So you will have to get it from http://www.ocf.berkeley.edu/~bac/apply3.diff . The only difference between my diff and Walter's is that I changed three files and removed the diff for test_builtin.py . ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:14 Message: Logged In: YES user_id=80475 Good job Brett :-) I'll wait for your next post before going through this one with a fine toothed comb. -- R ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 18:49 Message: Logged In: YES user_id=357491 I went through Walter's diff by hand and found two places where more clean-up could be done and two show-stoppers. In case I don't get my version of the patch up fast enough for people, the files that have spots that could use some more minor clean-up are Lib/lib-tk/Tix.py and Lib/lib-tk/Tkinter.py . The showstoppers are in Lib/lib-tk/tkCommonDialog.py (method call that didn't get *'ed) and Lib/test/test_builtin.py (test_builtin.py should not even be patched since the affected lines are in the tests for apply() itself). I will have my version up before the weekend. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-20 07:26 Message: Logged In: YES user_id=89016 I've gone over the patch and simplyfied it a bit (e.g. replacing f(*(1,2,3) + args) with f(1,2,3, *args)). I've also removed the patches for distutils, logging and bsddb (code at the start of bsddb/dbutils.py seems to indicate that it should be usable with versions prior to 2.3). Raymond, do you have time to recheck the patch? ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-03-12 00:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 17:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 10:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Fri Mar 21 07:43:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 20 Mar 2003 23:43:37 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 17:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 23:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 18:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 17:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 08:02:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 00:02:15 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-21 02:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2003-03-21 09:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 08:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 03:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 08:16:40 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 00:16:40 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 13:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Raymond Hettinger (rhettinger) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 09:16 Message: Logged In: YES user_id=89016 This shouldn't have anything to do with commit privileges. I'm uploading your apply3.diff so it doesn't get lost. If test_builtin calls apply it should probably make sure that both the PendingDeprecationWarning and the DeprecationWarning that might be issued some day are switched off. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 08:42 Message: Logged In: YES user_id=357491 Well, I have now run into my first issue of not having commit priveleges; I can't upload my diff. So you will have to get it from http://www.ocf.berkeley.edu/~bac/apply3.diff . The only difference between my diff and Walter's is that I changed three files and removed the diff for test_builtin.py . ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 06:14 Message: Logged In: YES user_id=80475 Good job Brett :-) I'll wait for your next post before going through this one with a fine toothed comb. -- R ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 03:49 Message: Logged In: YES user_id=357491 I went through Walter's diff by hand and found two places where more clean-up could be done and two show-stoppers. In case I don't get my version of the patch up fast enough for people, the files that have spots that could use some more minor clean-up are Lib/lib-tk/Tix.py and Lib/lib-tk/Tkinter.py . The showstoppers are in Lib/lib-tk/tkCommonDialog.py (method call that didn't get *'ed) and Lib/test/test_builtin.py (test_builtin.py should not even be patched since the affected lines are in the tests for apply() itself). I will have my version up before the weekend. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-20 16:26 Message: Logged In: YES user_id=89016 I've gone over the patch and simplyfied it a bit (e.g. replacing f(*(1,2,3) + args) with f(1,2,3, *args)). I've also removed the patches for distutils, logging and bsddb (code at the start of bsddb/dbutils.py seems to indicate that it should be usable with versions prior to 2.3). Raymond, do you have time to recheck the patch? ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-03-12 09:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-12 02:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 19:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Fri Mar 21 08:21:44 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 00:21:44 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-21 02:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 09:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 09:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 08:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 03:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 08:32:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 00:32:15 -0800 Subject: [Patches] [ python-Patches-701494 ] more apply removals Message-ID: Patches item #701494, was opened at 2003-03-11 04:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christos Georgiou (tzot) Assigned to: Raymond Hettinger (rhettinger) Summary: more apply removals Initial Comment: More apply() removals from the following files: ./compiler/transformer.py ./curses/wrapper.py ./distutils/command/build_ext.py ./distutils/command/build_py.py ./distutils/archive_util.py ./distutils/dir_util.py ./distutils/filelist.py ./distutils/util.py ./bsddb/test/test_basics.py ./bsddb/test/test_dbobj.py ./bsddb/dbobj.py ./bsddb/dbshelve.py ./lib-tk/Canvas.py ./lib-tk/Dialog.py ./lib-tk/ScrolledText.py ./lib-tk/Tix.py ./lib-tk/Tkinter.py ./lib-tk/tkColorChooser.py ./lib-tk/tkCommonDialog.py ./lib-tk/tkFont.py ./lib-tk/tkMessageBox.py ./lib-tk/tkSimpleDialog.py ./lib-tk/turtle.py ./test/reperf.py ./test/test_b1.py ./test/test_builtin.py ./test/test_curses.py ./logging/__init__.py ./logging/config.py ./xml/dom/minidom.py ./plat-mac/Carbon/MediaDescr.py ./plat-mac/EasyDialogs.py ./plat-mac/FrameWork.py ./plat-mac/MiniAEFrame.py ./plat-mac/argvemulator.py ./plat-mac/icopen.py I know that the edited files are syntactically correct (ie compileall.compile_dir throws no errors), but please help testing that functionality is the same. I am testing at the moment for lib-tk changes. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 00:32 Message: Logged In: YES user_id=357491 Well, then SF is broken right now because I don't have an option to upload. As for the PendingDeprecationWarning check, I think that is a good idea. Shouldn't that be a separate patch, though? I personally can't do it any time soon because of PyCon plus I have updating test_urllib on my todo list (thanks, Raymond =). ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 00:16 Message: Logged In: YES user_id=89016 This shouldn't have anything to do with commit privileges. I'm uploading your apply3.diff so it doesn't get lost. If test_builtin calls apply it should probably make sure that both the PendingDeprecationWarning and the DeprecationWarning that might be issued some day are switched off. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 23:42 Message: Logged In: YES user_id=357491 Well, I have now run into my first issue of not having commit priveleges; I can't upload my diff. So you will have to get it from http://www.ocf.berkeley.edu/~bac/apply3.diff . The only difference between my diff and Walter's is that I changed three files and removed the diff for test_builtin.py . ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:14 Message: Logged In: YES user_id=80475 Good job Brett :-) I'll wait for your next post before going through this one with a fine toothed comb. -- R ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 18:49 Message: Logged In: YES user_id=357491 I went through Walter's diff by hand and found two places where more clean-up could be done and two show-stoppers. In case I don't get my version of the patch up fast enough for people, the files that have spots that could use some more minor clean-up are Lib/lib-tk/Tix.py and Lib/lib-tk/Tkinter.py . The showstoppers are in Lib/lib-tk/tkCommonDialog.py (method call that didn't get *'ed) and Lib/test/test_builtin.py (test_builtin.py should not even be patched since the affected lines are in the tests for apply() itself). I will have my version up before the weekend. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-20 07:26 Message: Logged In: YES user_id=89016 I've gone over the patch and simplyfied it a bit (e.g. replacing f(*(1,2,3) + args) with f(1,2,3, *args)). I've also removed the patches for distutils, logging and bsddb (code at the start of bsddb/dbutils.py seems to indicate that it should be usable with versions prior to 2.3). Raymond, do you have time to recheck the patch? ---------------------------------------------------------------------- Comment By: Christos Georgiou (tzot) Date: 2003-03-12 00:46 Message: Logged In: YES user_id=539787 Walter: I untargzipped the python-latest.tgz of 2003-03-10 over an older directory (I think about a month ago), therefore the existence of test_b1.py. All files that exist in the current dist were also current. Raymond: you are correct about my not reading the file headers (it was a multifile vi session with a +/"apply(" option...) I just had a little time available for non-creative work, so I checked, saw that Guido already had changed most of the library files, and offered the change of the rest of them; you guys can do whatever you want with it :) The lib-tk changes seem to be ok, after running some UI python scripts I have. I haven't checked bsddb yet. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 17:41 Message: Logged In: YES user_id=80475 Also, be sure to read the PEP on which modules should not be modernized. Sometimes that information is written in the file itself rather than the pep. For instance, the logging package is supposed to be kept in a form that runs on older pythons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-11 10:34 Message: Logged In: YES user_id=89016 There is no longer a test/test_b1.py in current CVS, so it seems you've done the diff against an older version. Could you update the patch for current CVS? Also according to PEP 291 (http://www.python.org/peps/pep-0291.html) both distutils and logging should remain 1.5.2 compatible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701494&group_id=5470 From noreply@sourceforge.net Fri Mar 21 09:41:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 01:41:31 -0800 Subject: [Patches] [ python-Patches-681927 ] bundlebuilder: Add dylibs, frameworks to the bundle Message-ID: Patches item #681927, was opened at 2003-02-06 22:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681927&group_id=5470 Category: Macintosh Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Robin Dunn (robind) Assigned to: Just van Rossum (jvr) Summary: bundlebuilder: Add dylibs, frameworks to the bundle Initial Comment: This patch adds the ability to specify that shared libraries and Frameworks (the last is untested as of yet) to the bundle. It is mostly by Kevin Olliver with some suggestions by me. In addition to copying the files into the bundle the launcher script in the bundle is modified to set the DYLD_LIBRARY_PATH to the right place. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2003-03-21 10:41 Message: Logged In: YES user_id=92689 Thanks Robin, this is perfect. It's in CVS. ---------------------------------------------------------------------- Comment By: Robin Dunn (robind) Date: 2003-03-21 03:47 Message: Logged In: YES user_id=53955 New patch attached ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-03-18 19:22 Message: Logged In: YES user_id=92689 Having a manual option is a fine start. But can any of you rework the patch so it doesn't mess with whitespace, and update it for current CVS? ---------------------------------------------------------------------- Comment By: Kevin Ollivier (kollivier) Date: 2003-02-07 21:52 Message: Logged In: YES user_id=248468 I'll take a look at otool and see if it does what we need. As Robin mentioned, I think giving both the manual and auto options is the best approach. I'll also check into the dependency on Apple's Dev Tools, but even if it is dependent we could just switch off auto-detection if users don't have it and spit out a warning. Another possible way to alleviate this problem may be to integrate with distutils. (i.e. make a 'buildbundle' option) That should at least allow us to find and include any libraries the developer linked against. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-07 09:59 Message: Logged In: YES user_id=92689 I use tabs for indentation and use spaces for alignment... So things look nice _and_ wont screw up with different tab settings. But I admit that using such a non-standard way is asking for trouble. I'll convert to spaces after this patch has been done (unless you prefer I do it _before_ ;-). (Btw. it might be that otool is only available with the apple dev tools, which would be a shame since we otherwise don't depend in dev tools being available. Hm.) ---------------------------------------------------------------------- Comment By: Robin Dunn (robind) Date: 2003-02-07 00:20 Message: Logged In: YES user_id=53955 Oops, sorry for the witespace patches. I noticed that my lines used spaces but the lines around them were using tabs so I just ran a tabify on the whole file without taking another look at the resulting patch file after that. Looks like some of other lines that wre added since 2.3a1 have spaces too and that is where the problem comes from. I'll redo the patch but the whole file should probably be either tabified or untabified after you are done applying it. I didn't know about otool. I'll pass that on to Kevin. We discussed about doing automatic finding of libs but didn't know how to go about it so thought that this would be a good start. Also we figured that even if there was a way to do it that you would probably want a way to inlcude other files that may not get automatically found, or to exclude some that were, so there should be command line options for it anyway. ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2003-02-06 23:35 Message: Logged In: YES user_id=92689 Cool. There's a problem with the patch, though: although I apologize for using tabs to begin with, please keep the tab usage consistent. There are quite a few hunks in the patch that only touch whitespace and that's both undesirable as well as blurring the intent of the patch... Could you upload a cleaner one? Btw. for the --standalone build mode it would be possible to calculate all framework/dylib dependencies with the otool tool. If this were implemented perhaps the --lib option wouldn't even be needed? Another question remains: if we include a framework, is there a way to strip it from redunant files, eg. headers? If we would use this mechanism to include Python.framework we would definitely need a way to trim it down, eg. all of lib is taken care of by modulefinder anyway. If you (or Kevin) have any ideas about that, pls contact me off line. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=681927&group_id=5470 From noreply@sourceforge.net Fri Mar 21 10:57:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 02:57:54 -0800 Subject: [Patches] [ python-Patches-707427 ] Allow range() to return long integer values Message-ID: Patches item #707427, was opened at 2003-03-21 02:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707427&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Chad Netzer (chadn) Assigned to: Nobody/Anonymous (nobody) Summary: Allow range() to return long integer values Initial Comment: Extend range() builtin so that long integers may be generated. ie. range(10**20, 10**20 + 5) New code path is only executed when normal code path fails, to avoid slowing down the existing run path. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707427&group_id=5470 From noreply@sourceforge.net Fri Mar 21 16:53:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 08:53:11 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-13 01:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-21 17:53 Message: Logged In: YES user_id=45365 Donovan, I checked your fixes in, but possibly a bit premature: things broke. For example, running findertools.py as main program (a simple test of the scripting infrastructure) will now fail for me, in getbaseclasses(writing_code). And that seems correct: writing_code is an NProperty, not a ComponentItem. Before you fix things: please check out a fresh tree. I seriously hacked gensuitemodule after applying your mods (it can now run non-interactive on MacOSX). ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-18 19:00 Message: Logged In: YES user_id=111050 Jack, Thanks for taking a look at this. You are correct, if a class has no properties then v._propdict will still be empty, and we will do unneccessary work the next time getbaseclasses is called. I suppose it could be "if not v._propdict and not v._elemdict:" which would reduce the unnecessary work down to when a base class has neither properties nor elements; frankly the if is not really required at all; it was just an attempt to prevent work that has already been performed from being performed again unnecessarily. Suggestions welcome. Re _superclassnames, like everything else done with gensuitemodule, we need to be really careful about circular references, references to things that haven't been defined yet, etc. Everything generated by gensuitemodule is either a ComponentItem or an NProperty, and they don't actually inherit from each other in Python because doing so would be too hairy. So we can't use __bases__ because there is none :-) The thing about _superclassnames is that it's just what it sounds like; a list of strings that indicate superclasses of the current class. By deferring getbaseclasses to import time, we ensure all of the base classes are defined by then. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-16 23:42 Message: Logged In: YES user_id=45365 Donovan, in as far as I understand the matter (in which area you are clearly my superior:-) I think the idea of the fix is correct, but I have one misgiving: if a class has no properties then v._propdict will still be empty after getbaseclasses(). This will result in the next call of getbaseclasses (if this class is the base class of another) going through the motions again. Is this a problem? Also, do we really need _superclassnames, can't we do this with __bases__? I vaguely remember we went through this issue before, but I can't remember fully... ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-13 01:08 Message: Logged In: YES user_id=111050 Whoops. Have to click the checkbox. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-13 01:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Fri Mar 21 19:18:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 11:18:41 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 19:36:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 11:36:39 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 19:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Fri Mar 21 19:37:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 11:37:19 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 19:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None >Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Fri Mar 21 20:09:03 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 12:09:03 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-21 02:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2003-03-21 21:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 20:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 09:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 09:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 08:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 03:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 20:28:43 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 12:28:43 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 20:59:45 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 12:59:45 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 11:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 12:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Fri Mar 21 21:14:03 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 13:14:03 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 19:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Matthias Klose (doko) Date: 2003-03-21 21:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 20:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Fri Mar 21 21:28:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 13:28:21 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-19 23:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Stuart Bishop (zenzen) Assigned to: Neal Norwitz (nnorwitz) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-21 16:28 Message: Logged In: YES user_id=33168 After patching, the test fails: File "/home/neal/build/python/2_3/Lib/test/test_time.py", line 115, in test_tzset self.failUnlessEqual(time.daylight,1) File "/home/neal/build/python/2.3/Lib/unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: 0 != 1 Also, why is the code commented out (via a string) on lines 120-144? Should these be removed? I see the comment about wallclock time, but don't understand why the code should be left in if we can't test it. I can understand a comment describing generally the issue. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-20 20:18 Message: Logged In: YES user_id=33168 I'll try to get to this soon. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-20 20:11 Message: Logged In: YES user_id=6380 Unassigning, as I won't hve time for this. But it is important - someone else should make sure this goes into 2.3b1! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-20 16:50 Message: Logged In: YES user_id=31435 Assigned to Guido, as I can't test it. Two notes: 1. Leaving commented-out code in config and the test suite doesn't appear to serve a purpose, although it will serve to confuse future readers ("why is this here? why is it commented out?"). 2. The Python style guide asks for a blank after commas in argument lists and tuples. We're not really in danger of stretching the screen here . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Fri Mar 21 21:31:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 13:31:21 -0800 Subject: [Patches] [ python-Patches-675422 ] Add tzset method to time module Message-ID: Patches item #675422, was opened at 2003-01-27 08:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Stuart Bishop (zenzen) Assigned to: Guido van Rossum (gvanrossum) Summary: Add tzset method to time module Initial Comment: Adds access to the tzset method, allowing you to change your local timezone as required. In addition to invoking the tzset system call, the code also updates the timezone attributes (time.timezone etc). This lets you do timezone conversions amongst other things. Also includes changes to configure.in to only build new code if the tzset method correctly switches timezones on your platform. This should be for all modern Unixes, and possibly other platforms. Also includes tests in test_time.py Docs would be along the lines of: tzset() -- Initialize, or reinitialize, the local timezone to the value stored in os.environ['TZ']. The TZ environment variable should be specified in standard Uniz timezone format as documented in the tzset man page (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently fall back to UTC. If the TZ environment variable is not set, the local timezone is set to the systems best guess of wallclock time. Changing the TZ environment variable without calling tzset *may* change the local timezone used by methods such as localtime, but this behaviour should not be relied on. eg:: >>> now = time.time() >>> os.environ['TZ'] = 'Europe/Amsterdam' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 14:35:17 2003' >>> time.tzname ('CET', 'CEST') >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.ctime(now) 'Mon Jan 27 08:35:17 2003' >>> time.tzname ('EST', 'EDT') ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-21 16:31 Message: Logged In: YES user_id=33168 Closing again, the problem can be addressed thru the other patch. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-03-20 00:06 Message: Logged In: YES user_id=46639 An update to this patch is now available: http://www.python.org/sf/706707 ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-19 23:18 Message: Logged In: YES user_id=33168 test_time is now failing on Solaris 8. altzone is -3600, but should be 0. Also, is there a reason to compare timezone to altzone, but then check that each is 0 (line 78)? Can you provide any suggestions for where to look for the problem? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-14 17:03 Message: Logged In: YES user_id=6380 OK, checked in with that line removed. Thanks! ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-03-07 23:42 Message: Logged In: YES user_id=46639 Leave it commented out or remove that line. It is testing unimportant behaviour that looks more platform dependant than I suspected (and now I look at it again, what tzname should be set to if the timezone is unknow is unspecified by the tzset(3) docs). The important behaviour is that: a) the system silently falls back to UTC if the timezone is unknown, and this is tested elsewhere b) calling tzset resets tzname, which is also tested elsewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-07 09:25 Message: Logged In: YES user_id=6380 zenzen: when I run the test suite on my Red Hat Linux 7.3 box, I get one failure: the test line self.failUnless(time.tzname[0] in ('UTC','GMT')) fails when the timezone is set to 'Luna/Tycho', because tzname is in fact set to ('Luna/Tych', 'Luna/Tych'). If I comment out that one line the tzset test suite passes. What should I do? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 16:49 Message: Logged In: YES user_id=6380 Sorry, not a chance. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-21 16:45 Message: Logged In: YES user_id=46639 It is a patch to 2.3, but I'd though I'd try and sneak this new feature past people into 2.2.3 as I want to be able to use it in Zope 2 :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-21 07:56 Message: Logged In: YES user_id=6380 Uh? This is a new feature, so doesn't apply to 2.2.3. Maybe you meant 2.3? ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-02-20 23:29 Message: Logged In: YES user_id=46639 Assigning to Guido for consideration of being added to 2.2.3, and since he through this patch was a good idea in the first place :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675422&group_id=5470 From noreply@sourceforge.net Fri Mar 21 22:10:05 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 14:10:05 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 11:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 14:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 13:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 12:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Fri Mar 21 22:15:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 14:15:39 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 19:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Matthias Klose (doko) Date: 2003-03-21 22:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 22:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 21:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 20:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Fri Mar 21 22:36:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 14:36:15 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-12 15:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Donovan Preston (dsposx) Date: 2003-03-21 13:36 Message: Logged In: YES user_id=111050 I am surprised I didn't have the same problem -- I should have. I suppose that's why I had the if hasattr in the first version of getbaseclasses. Changing if not v._propdict: to if not getattr(v, '_propdict', None): Would probably work. I am going to PyCon next week, so I will have more free time to work on non-directly-work related items. I'll do a fresh checkout and build of framework python, and experiment with the latest gensuitemoudle etc. Donovan ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-21 07:53 Message: Logged In: YES user_id=45365 Donovan, I checked your fixes in, but possibly a bit premature: things broke. For example, running findertools.py as main program (a simple test of the scripting infrastructure) will now fail for me, in getbaseclasses(writing_code). And that seems correct: writing_code is an NProperty, not a ComponentItem. Before you fix things: please check out a fresh tree. I seriously hacked gensuitemodule after applying your mods (it can now run non-interactive on MacOSX). ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-18 09:00 Message: Logged In: YES user_id=111050 Jack, Thanks for taking a look at this. You are correct, if a class has no properties then v._propdict will still be empty, and we will do unneccessary work the next time getbaseclasses is called. I suppose it could be "if not v._propdict and not v._elemdict:" which would reduce the unnecessary work down to when a base class has neither properties nor elements; frankly the if is not really required at all; it was just an attempt to prevent work that has already been performed from being performed again unnecessarily. Suggestions welcome. Re _superclassnames, like everything else done with gensuitemodule, we need to be really careful about circular references, references to things that haven't been defined yet, etc. Everything generated by gensuitemodule is either a ComponentItem or an NProperty, and they don't actually inherit from each other in Python because doing so would be too hairy. So we can't use __bases__ because there is none :-) The thing about _superclassnames is that it's just what it sounds like; a list of strings that indicate superclasses of the current class. By deferring getbaseclasses to import time, we ensure all of the base classes are defined by then. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-16 13:42 Message: Logged In: YES user_id=45365 Donovan, in as far as I understand the matter (in which area you are clearly my superior:-) I think the idea of the fix is correct, but I have one misgiving: if a class has no properties then v._propdict will still be empty after getbaseclasses(). This will result in the next call of getbaseclasses (if this class is the base class of another) going through the motions again. Is this a problem? Also, do we really need _superclassnames, can't we do this with __bases__? I vaguely remember we went through this issue before, but I can't remember fully... ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Whoops. Have to click the checkbox. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-12 15:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Fri Mar 21 22:43:16 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 14:43:16 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 17:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 14:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 12:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 12:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 11:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 00:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 00:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 23:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 18:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 17:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 23:04:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 15:04:32 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Fri Mar 21 23:50:51 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 15:50:51 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 17:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 15:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 14:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 12:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 12:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 11:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 00:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 00:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 23:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 18:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 17:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Sat Mar 22 04:08:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 20:08:59 -0800 Subject: [Patches] [ python-Patches-707900 ] bug fix 702858: deepcopying reflexive objects Message-ID: Patches item #707900, was opened at 2003-03-21 21:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707900&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Steven Taschuk (staschuk) Assigned to: Nobody/Anonymous (nobody) Summary: bug fix 702858: deepcopying reflexive objects Initial Comment: A fix for bug 702858, which concerns the inability of copy.deepcopy to correctly process reflexive new-style class instances, that is, instances referring to themselves. The fix is one line; the other 51 lines in the patch are altered and enhanced altered tests in test_copy.py for this kind of thing. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707900&group_id=5470 From noreply@sourceforge.net Sat Mar 22 07:26:01 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 21 Mar 2003 23:26:01 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 19:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Matthias Klose (doko) Date: 2003-03-22 07:26 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 22:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 22:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 21:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 20:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Sat Mar 22 13:34:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 22 Mar 2003 05:34:34 -0800 Subject: [Patches] [ python-Patches-708007 ] TelnetPopen3, TelnetBase, Expect split Message-ID: Patches item #708007, was opened at 2003-03-22 13:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708007&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Luke Kenneth Casson Leighton (lkcl) Assigned to: Nobody/Anonymous (nobody) Summary: TelnetPopen3, TelnetBase, Expect split Initial Comment: A reordering / code-split of Telnet in telnetlib.py into Expect (the lowest base class), TelnetBase, Telnet and TelnetPopen4. Reason: Expect contains all of the read_xxx(), expect(), write() and select() functions (and the interact() and mt_interact()) TelnetPopen4 and Telnet derive from the same TelnetBase class, and there is nothing stopping anyone from writing a TelnetHTTP or TelnetURL class which will all have the same interface: expect() and write() and even interact()! weird, huh - typing in URLs and getting the content back, interactively :) these TelnetXXX classes are all incredibly useful for "remote host management" purposes; also the principle of the TelnetHTTP class is very useful for doing automated testing of web sites. send URL, expect text in it before proceeding with next URL (e.g. login, check to see if login failed or succeeded; react accordingly). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708007&group_id=5470 From noreply@sourceforge.net Sun Mar 23 00:15:12 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 22 Mar 2003 16:15:12 -0800 Subject: [Patches] [ python-Patches-708201 ] unchecked return value in import.c Message-ID: Patches item #708201, was opened at 2003-03-22 17:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708201&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jason Harper (jasonharper) Assigned to: Nobody/Anonymous (nobody) Summary: unchecked return value in import.c Initial Comment: In Python/import.c, routine PyImport_ImportModule, a call to PyString_AsString is not checked for errors. A possibly NULL return value gets passed to another routine, and DECREFed. It's not a particularly likely place for an error to occur, but I did manage to get a MemoryError at exactly that point, resulting in a Python crash. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708201&group_id=5470 From noreply@sourceforge.net Sun Mar 23 11:59:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 03:59:55 -0800 Subject: [Patches] [ python-Patches-612627 ] Allow more Unicode on sys.stdout Message-ID: Patches item #612627, was opened at 2002-09-21 22:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=612627&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Allow more Unicode on sys.stdout Initial Comment: This patch extends the set of Unicode strings that can be printed to sys.stdout, to support all strings that the terminal will likely support. It also adds an encoding attribute to sys.std{in,out}. To do that: - it adds a .encoding attribute to all file objects, which is normally None - initializes the encoding of sys.stdin and sys.stdout if either is a terminal. - adds a wrapper object around sys.stdout in site.py that encodes all Unicode objects according to the detected encoding, if that encoding is known to Python To find the encoding of the terminal, it - uses GetConsoleCP and GetConsoleOutputCP on Windows, - uses nl_langinfo(CODESET) on Unix, if available. The primary rationale for this change is that people should be able to print Unicode in an interactive session. A parallel change needs to be added for IDLE, so that it adds the .encoding attribute to the emulated stdout (it already supports printing of Unicode on stdout). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 12:59 Message: Logged In: YES user_id=21627 Is the patch now acceptable? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-26 19:47 Message: Logged In: YES user_id=21627 I've attached a revised version which implements your proposal; this version works without modification of site.py. In its current form, the file encoding is only applied in print; for sys.stdout.write, it is ignored. For print, it is applied independent of whether this is a script or interactive mode. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-10-25 14:09 Message: Logged In: YES user_id=38388 I think it could work by adding a special case to PyFile_WriteObject() instead of calling PyObject_Print(). You first encode the Unicode object and then let PyFile_WriteString() take care of the writing to the FILE* object. I see no other way, since you can't place the .encoding information into the FILE* object. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-24 11:02 Message: Logged In: YES user_id=21627 I have considered implementing it in the file object. However, it becomes quite involved, and heavy C code: PyFile_WriteObject calls PyObject_Print. Since Unicode does not implement a tp_print, this calls str/repr, which converts using the default encoding. It is not clear at which point the file encoding should be taking into account. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2002-09-24 10:10 Message: Logged In: NO I like the .encoding concept. I don't really like the sys.stdout wrapper. Wouldn't it be better to add the functionality to the file object .write() and .writelines() methods and then only use the wrapper in case sys.stdout is not a true file object ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=612627&group_id=5470 From noreply@sourceforge.net Sun Mar 23 12:07:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 04:07:11 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 20:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) >Assigned to: Martin v. Löwis (loewis) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 13:07 Message: Logged In: YES user_id=21627 What is the problem that this patch solves? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-22 08:26 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 23:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 23:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 22:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 21:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Sun Mar 23 13:35:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 05:35:58 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 19:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Martin v. Löwis (loewis) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Matthias Klose (doko) Date: 2003-03-23 13:35 Message: Logged In: YES user_id=60903 > What is the problem that this patch solves? As the subject says: Provide a patch for #698517. tk8.4.2 returns for the undefined fields in events empty strings or '??' strings, on which the int conversions fail. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 12:07 Message: Logged In: YES user_id=21627 What is the problem that this patch solves? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-22 07:26 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 22:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 22:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 21:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 20:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From dapatches@lycos.com Sun Mar 23 13:41:42 2003 From: dapatches@lycos.com (dapatches@lycos.com) Date: Sun, 23 Mar 2003 08:41:42 -0500 Subject: [Patches] dobyvatel coleret Never Forget Message-ID: <61w28a7n81ef5q0i8l6j$ondayh1ip6rn.dapatches@lycos.com>

patches

Capture Your DreamEarn Financial Independence

You can now for the first time, own a business in your area with the most unique, innovative product in America today. Work less a week with the potential to earn $100,000 a year. There is no selling and not MLM. Join a Multi-Trillion Dollar Market.

The profit margin is amazing.

Break down the walls and live this life you've only dreamed about.

Limited availability. for Your Free information package.

START N-O-W

You must check this out if you are serious about making money!

From noreply@sourceforge.net Sun Mar 23 14:33:46 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 06:33:46 -0800 Subject: [Patches] [ python-Patches-708374 ] add offset to mmap Message-ID: Patches item #708374, was opened at 2003-03-23 09:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add offset to mmap Initial Comment: This patch is from Yotam Medini sent to me in mail. It adds support for the offset parameter to mmap. It ignores the check for mmap size "if the file is character device. Some device drivers (which I happen to use) have zero size in fstat buffer, but still one can seek() read() and tell()." I added minimal doc and tests. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 From noreply@sourceforge.net Sun Mar 23 14:46:08 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 06:46:08 -0800 Subject: [Patches] [ python-Patches-708201 ] unchecked return value in import.c Message-ID: Patches item #708201, was opened at 2003-03-22 19:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708201&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Jason Harper (jasonharper) >Assigned to: Neal Norwitz (nnorwitz) Summary: unchecked return value in import.c Initial Comment: In Python/import.c, routine PyImport_ImportModule, a call to PyString_AsString is not checked for errors. A possibly NULL return value gets passed to another routine, and DECREFed. It's not a particularly likely place for an error to occur, but I did manage to get a MemoryError at exactly that point, resulting in a Python crash. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 09:46 Message: Logged In: YES user_id=33168 Thanks! Checked in as: Python/import.c 2.220 and 2.192.6.4 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708201&group_id=5470 From noreply@sourceforge.net Sun Mar 23 15:23:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 07:23:39 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Sun Mar 23 15:35:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 07:35:19 -0800 Subject: [Patches] [ python-Patches-695710 ] fix bug 678519: cStringIO self iterator Message-ID: Patches item #695710, was opened at 2003-03-01 14:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Nobody/Anonymous (nobody) Summary: fix bug 678519: cStringIO self iterator Initial Comment: StringIO.StringIO already appears to be a self-iterator. This patch makes cStringIO.StringIO a self-iterator as well. It also does a tiny bit of cleanup to cStringIO. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-19 16:27 Message: Logged In: YES user_id=33168 I don't think it's impolite. I'll try to take a look later, unless someone beats me to it. :-) ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-19 16:17 Message: Logged In: YES user_id=670441 Okay, patchstrio4 uses PyObject_SelfIter and doesn't have as much of my prettification, so there aren't any whitespace-only diff lines. (I think) Should I assign this patch to either Neal or MvL for further review, or would that be impolite? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-19 14:54 Message: Logged In: YES user_id=80475 It looks good to me, compiles okay, passes tests, etc. I do prefer that you get one more reviewer to look at it. Neal or MvL might be a good choice. GvR picked PyObject_SelfIter to be the name of the iterator's tp_iter slot filler. So you can go ahead and use it to eliminate IO_getiter. One nit, when you load the next patch, copy in the unchanged lines from the original. There are many lines marked as having a change but the content is the same. This means that something changed in the whitespace. It's not big deal but it makes the patch harder to review. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-19 13:39 Message: Logged In: YES user_id=670441 I'm not sure I understand your concern with the new tp_iter slot, it just makes cStringIO a self iterator as requested on python-dev, going for the analogy with file objects, right? Actually it should probably use the still-being-debated GenericGetIter or whatever it will be called, but not until the debate is over. I think the get/setattrs are okay. Everything they did is done by the default get/set attrs, once we set up the appropriate methods and members (there's just the one member, softspace). I thought replacing them by the defaults would be clearer and easier to maintain. Also, it is in analogy with fileobject.c, so I thought making the cStringIO implementation more like file's would be good. As for the creating a new tuple every time and the 0,0,0,0 style, you're absolutely right, I've attached a new patch that fixes those up per your suggestions. I was creating a new tuple every time in analogy with iterobject.c's calliter_iternext. Perhaps that should be changed as well? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-17 19:55 Message: Logged In: YES user_id=80475 I'm going to unassign this one because the patch makes me uncomfortable. The tp_iter slot was already filled in a way that is reasonable and the new code doesn't seem to be an improvement. If you go ahead with it, carefully consider whether some negative effects can arise from eliminating the get/setattrs. Also, the call to readline should avoid creating a new empty tuple on each call (either make a single one and re-use it everytime or alter readline to accept a NULL for args). The 0,0,0,0,0,0,0 style in the type definition should be spelled-out line by line so that it is maintainable and is consistent with other modules. All that being said, the test cases were nice and code runs flawlessly. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-11 21:35 Message: Logged In: YES user_id=670441 I prefer that too, but I can't attach patches to existing bug reports in sourceforge, only to bug reports or patches I open myself. Nor can I delete patches I have attached if I don't like them. Actually, the advice I read somewhere or other (python.org developer faq?) recommends opening a separate patch all the time, but I'd rather be able to put them with the bug reports. I used to paste patches directly into the text of a message, but this is only good for extremely short patches on sourceforge. When doing that I noticed that patches for old bugs that haven't been discussed in a few months tend to get ignored, which is another plus for opening a separate patch. (There seem to be several very old bugs which have solutions attached or discussion indicates they should be closed) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-11 20:44 Message: Logged In: YES user_id=80475 I don't know about the other reviewers but I prefer that the patches be attached to the original bug instead on a new patch tracker on SF. This makes it easier to follow the dialogue on this issue. ---------------------------------------------------------------------- Comment By: Michael Stone (mbrierst) Date: 2003-03-05 17:16 Message: Logged In: YES user_id=670441 patchcstrio2 is a better version, more cleaned up. Use it instead. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695710&group_id=5470 From noreply@sourceforge.net Sun Mar 23 15:37:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 07:37:17 -0800 Subject: [Patches] [ python-Patches-708374 ] add offset to mmap Message-ID: Patches item #708374, was opened at 2003-03-23 09:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add offset to mmap Initial Comment: This patch is from Yotam Medini sent to me in mail. It adds support for the offset parameter to mmap. It ignores the check for mmap size "if the file is character device. Some device drivers (which I happen to use) have zero size in fstat buffer, but still one can seek() read() and tell()." I added minimal doc and tests. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:37 Message: Logged In: YES user_id=33168 Email received from Yotam: I have downloaded and patched the 2.3a source. compiled locally just this module, and it worked fine for my application (with offset for character device file) I did not run the released test though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 From Hani Henderson" ------=_NextPart_0324030054 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: base64 PGh0bWw+PGhlYWQ+PC9oZWFkPjxib2R5IGJnY29sb3I9I2NjY2NjYz48ZGl2IGFsaWduPWNlbnRl cj48Y2VudGVyPjxicj48dGFibGUgYm9yZGVyPTAgY2VsbHBhZGRpbmc9MCB3aWR0aD03NSU+PHRy Pjx0ZCBiZ2NvbG9yPSNjY2NjY2M+PHRhYmxlIGJvcmRlcj0xIGJvcmRlcmNvbG9yZGFyaz0jZmZm ZmZmIGJvcmRlcmNvbG9ybGlnaHQ9Izg4ODg4OCBjZWxscGFkZGluZz00IGNlbGxzcGFjaW5nPTAg d2lkdGg9MTAwJT48dGJvZHk+PHRyPjx0ZCBhbGlnbj1yaWdodCBiZ2NvbG9yPSNjY2NjY2MgY29s c3Bhbj0yIGhlaWdodD02MD48ZGl2IGFsaWduPWNlbnRlcj48Zm9udCBjb2xvcj0jRkYwMDAwPjxi Pjxmb250IHNpemU9Nj5IdW1hbiBHcm93dGggSG9ybW9uZSBUaGVyYXB5PC9mb250PjwvYj48L2Zv bnQ+PC9kaXY+PC90ZD48L3RyPjx0cj48dGQgYWxpZ249bGVmdCBiZ2NvbG9yPSNmZmZmZmYgY29s c3Bhbj0yPjxkaXYgYWxpZ249Y2VudGVyPjxwPjxmb250IGNvbG9yPSNGRjAwMDA+PGJyPiA8L2Zv bnQ+QXMgc2VlbiBvbiBOQkMsIENCUywgYW5kIENOTiwgYW5kIGV2ZW4gT3ByYWghIFRoZSBoZWFs dGg8YnI+IGRpc2NvdmVyeSB0aGF0IGFjdHVhbGx5IHJldmVyc2VzIGFnaW5nIHdoaWxlIGJ1cm5p bmcgZmF0LDxicj4gd2l0aG91dCBkaWV0aW5nIG9yIGV4ZXJjaXNlISBUaGlzIHByb3ZlbiBkaXNj b3ZlcnkgaGFzIGV2ZW48YnI+IGJlZW4gcmVwb3J0ZWQgb24gYnkgdGhlIE5ldyBFbmdsYW5kIEpv dXJuYWwgb2YgTWVkaWNpbmUuPGJyPiBGb3JnZXQgYWdpbmcgYW5kIGRpZXRpbmcgZm9yZXZlciEg QW5kIGl0J3MgR3VhcmFudGVlZCEgPGJyPiA8YnI+PC9wPjx0YWJsZSB3aWR0aD0zNzUgYm9yZGVy PTA+PHRyPjx0ZCB3aWR0aD0xOTQ+Qm9keSBGYXQgTG9zczxicj4gV3JpbmtsZSBSZWR1Y3Rpb248 YnI+IEVuZXJneSBMZXZlbDxicj4gTXVzY2xlIFN0cmVuZ3RoPGJyPiBTZXh1YWwgUG90ZW5jeTxi cj4gRW1vdGlvbmFsIFN0YWJpbGl0eTxicj4gTWVtb3J5IDxicj48L3RkPjx0ZCB3aWR0aD0xNw0K MT44MiUgaW1wcm92ZW1lbnQ8YnI+IDYxJSBpbXByb3ZlbWVudDxicj4gODQlIGltcHJvdmVtZW50 PGJyPiA4OCUgaW1wcm92ZW1lbnQ8YnI+IDc1JSBpbXByb3ZlbWVudDxicj4gNjclIGltcHJvdmVt ZW50PGJyPiA2MiUgaW1wcm92ZW1lbnQ8L3RkPjwvdHI+PC90YWJsZT48L2Rpdj48L3RkPjwvdHI+ PHRyPjx0ZCBhbGlnbj1yaWdodCBiZ2NvbG9yPSNjY2NjY2M+PGRpdiBhbGlnbj1jZW50ZXI+PGI+ PGZvbnQgc2l6ZT00PlsgPGEgaHJlZj1odHRwOi8vd3d3Lm9ubGluZWRucy5vcmcvcHUvPlZpc2l0 IE91ciBXZWIgU2l0ZSBhbmQgTGVhcm4gVGhlIEZhY3RzPC9hPiBdPC9mb250PjwvYj48L2Rpdj48 L3RkPjwvdHI+PC90Ym9keT48L3RhYmxlPjwvdGQ+PC90cj48L3RhYmxlPjxicj48L2NlbnRlcj48 L2Rpdj48L2JvZHk+PC9odG1sPg== From nisi@bigfoot.com Mon Mar 24 06:24:29 2003 From: nisi@bigfoot.com (nisi@bigfoot.com) Date: Sun, 23 Mar 2003 13:24:29 -1700 Subject: [Patches] Re: PROTECT YOUR COMPUTER AND YOUR VALUABLE INFORMATION! 8882 Message-ID: <0000243d15ed$00005588$00002f16@ero.u-tokyo.ac.jp>

Norton Antivirus 2003 Internet Security

Special Price >> Only $29.95


Do you know that someone may be trying to hack your computer now ?

Protect yourself for ONLY $29.95 with the Most Trusted Name in Virus-Scan= Software.

Norton Antivirus 2003 - Full Version - Download it instantly to your syst= em

No Need To Wait For An Installation CD!

Click Here to Protect Your Computer

Order Norton Antivirus 2003 Today, and enjoy 2003 Virus Free!

Free Virus Updates for 1 Full Year

CLICK HERE, ONLY $29.95: Click Here to Protect Your Computer





We are strongly against sending unsolicited emails to those who do not wi= sh to receive our special mailings. You have opted in to one or more of our aff= iliate sites requesting to be notified of any special offers we may run from tim= e to time. We also have attained the services of an independent 3rd party to overloo= k list management and removal services.

This is NOT unsolicited email. If you do not wish to receive further mailings, please CLICK HERE to be removed from the list. Please accept our apologies if you have been sent this email in error. We honor = all removal requests.
From noreply@sourceforge.net Sun Mar 23 20:01:46 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 12:01:46 -0800 Subject: [Patches] [ python-Patches-708495 ] OpenVMS complementary patches Message-ID: Patches item #708495, was opened at 2003-03-23 21:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Nobody/Anonymous (nobody) Summary: OpenVMS complementary patches Initial Comment: Explanations of the various patches: fcntlmodule.c Under VMS the third argument is declared as void * expat.h VMS C compiler can optionally mangle name longer than 31 characters, so it not necessary to change long name fileobject.c As the comment indicate this solve a problem into test_fileinput, but I don't understand why... fpectlmodule.c Enable SIGFPE handler import.c Support of VMS filesystem ODS-5 mmapmodule.c VMS need a fsync before a call to fstat to return accurate information myreadline.c Use of vms__StdioReadline posixmodule.c I have move some initialisation part to a specific VMS file, so I have remove it form posixmodule.c pyexpat.c Convert VMS filename to a UNIX style filename. socketmodule.c This patch is the only one which is not delimited by #ifdef __VMS #endif because IMHO it fix a bug into the original code socketmodule.h need to include socket.h and not sys/socket.h sysmodule.c Convert VMS filename to a UNIX style filename. Regards, Jean-François ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 From noreply@sourceforge.net Sun Mar 23 20:04:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 12:04:15 -0800 Subject: [Patches] [ python-Patches-708495 ] OpenVMS complementary patches Message-ID: Patches item #708495, was opened at 2003-03-23 21:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) >Assigned to: Martin v. Löwis (loewis) Summary: OpenVMS complementary patches Initial Comment: Explanations of the various patches: fcntlmodule.c Under VMS the third argument is declared as void * expat.h VMS C compiler can optionally mangle name longer than 31 characters, so it not necessary to change long name fileobject.c As the comment indicate this solve a problem into test_fileinput, but I don't understand why... fpectlmodule.c Enable SIGFPE handler import.c Support of VMS filesystem ODS-5 mmapmodule.c VMS need a fsync before a call to fstat to return accurate information myreadline.c Use of vms__StdioReadline posixmodule.c I have move some initialisation part to a specific VMS file, so I have remove it form posixmodule.c pyexpat.c Convert VMS filename to a UNIX style filename. socketmodule.c This patch is the only one which is not delimited by #ifdef __VMS #endif because IMHO it fix a bug into the original code socketmodule.h need to include socket.h and not sys/socket.h sysmodule.c Convert VMS filename to a UNIX style filename. Regards, Jean-François ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 From noreply@sourceforge.net Sun Mar 23 20:09:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 12:09:13 -0800 Subject: [Patches] [ python-Patches-708495 ] OpenVMS complementary patches Message-ID: Patches item #708495, was opened at 2003-03-23 21:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None >Priority: 9 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Martin v. Löwis (loewis) Summary: OpenVMS complementary patches Initial Comment: Explanations of the various patches: fcntlmodule.c Under VMS the third argument is declared as void * expat.h VMS C compiler can optionally mangle name longer than 31 characters, so it not necessary to change long name fileobject.c As the comment indicate this solve a problem into test_fileinput, but I don't understand why... fpectlmodule.c Enable SIGFPE handler import.c Support of VMS filesystem ODS-5 mmapmodule.c VMS need a fsync before a call to fstat to return accurate information myreadline.c Use of vms__StdioReadline posixmodule.c I have move some initialisation part to a specific VMS file, so I have remove it form posixmodule.c pyexpat.c Convert VMS filename to a UNIX style filename. socketmodule.c This patch is the only one which is not delimited by #ifdef __VMS #endif because IMHO it fix a bug into the original code socketmodule.h need to include socket.h and not sys/socket.h sysmodule.c Convert VMS filename to a UNIX style filename. Regards, Jean-François ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 From noreply@sourceforge.net Sun Mar 23 20:10:13 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 12:10:13 -0800 Subject: [Patches] [ python-Patches-708495 ] OpenVMS complementary patches Message-ID: Patches item #708495, was opened at 2003-03-23 21:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None >Priority: 5 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Martin v. Löwis (loewis) Summary: OpenVMS complementary patches Initial Comment: Explanations of the various patches: fcntlmodule.c Under VMS the third argument is declared as void * expat.h VMS C compiler can optionally mangle name longer than 31 characters, so it not necessary to change long name fileobject.c As the comment indicate this solve a problem into test_fileinput, but I don't understand why... fpectlmodule.c Enable SIGFPE handler import.c Support of VMS filesystem ODS-5 mmapmodule.c VMS need a fsync before a call to fstat to return accurate information myreadline.c Use of vms__StdioReadline posixmodule.c I have move some initialisation part to a specific VMS file, so I have remove it form posixmodule.c pyexpat.c Convert VMS filename to a UNIX style filename. socketmodule.c This patch is the only one which is not delimited by #ifdef __VMS #endif because IMHO it fix a bug into the original code socketmodule.h need to include socket.h and not sys/socket.h sysmodule.c Convert VMS filename to a UNIX style filename. Regards, Jean-François ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 From noreply@sourceforge.net Sun Mar 23 20:28:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 12:28:49 -0800 Subject: [Patches] [ python-Patches-708495 ] OpenVMS complementary patches Message-ID: Patches item #708495, was opened at 2003-03-23 21:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Martin v. Löwis (loewis) Summary: OpenVMS complementary patches Initial Comment: Explanations of the various patches: fcntlmodule.c Under VMS the third argument is declared as void * expat.h VMS C compiler can optionally mangle name longer than 31 characters, so it not necessary to change long name fileobject.c As the comment indicate this solve a problem into test_fileinput, but I don't understand why... fpectlmodule.c Enable SIGFPE handler import.c Support of VMS filesystem ODS-5 mmapmodule.c VMS need a fsync before a call to fstat to return accurate information myreadline.c Use of vms__StdioReadline posixmodule.c I have move some initialisation part to a specific VMS file, so I have remove it form posixmodule.c pyexpat.c Convert VMS filename to a UNIX style filename. socketmodule.c This patch is the only one which is not delimited by #ifdef __VMS #endif because IMHO it fix a bug into the original code socketmodule.h need to include socket.h and not sys/socket.h sysmodule.c Convert VMS filename to a UNIX style filename. Regards, Jean-François ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 21:28 Message: Logged In: YES user_id=21627 Can you please combine the patches into a single patch, which can be applied using patch -p0 ??? You can use "diff -ur" or "cvs diff" to create a recursive patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 From noreply@sourceforge.net Sun Mar 23 22:20:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 14:20:21 -0800 Subject: [Patches] [ python-Patches-702620 ] AE Inheritance fixes Message-ID: Patches item #702620, was opened at 2003-03-13 01:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 Category: Macintosh Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: AE Inheritance fixes Initial Comment: A while ago, I submitted a patch that attempted to make modules generated by gensuitemodule inheritance aware. It was quite a hack, but it did the job. Some patches to cvs in the meantime have made this stop working for me. Here are my attempted fixes. If for some reason there's some use case besides mine where this implementation doesn't work, I'd like to know about it so we can come up with an implementation that works everywhere :) 1) We don't ever want an _instance_ of ComponentItem to have a personal _propdict and _elemdict. They need to inherit these attributes from the class, which was set up in the __init__.py to have the correct entries. Thus, I moved the initialization of _propdict and _elemdict out of __init__ and into the class definition. 2) getbaseclasses needs to look through the inheritance tree specified by _superclassnames and for each class in the tree, copy _privpropdict and _privelemdict to _propdict and _elemdict. Then, it needs to copy _propdict and _elemdict from each superclass into it's own _propdict and _elemdict, where ComponentItem.__getattr__ will find it. Making these into flat dictionaries on each class that include all of the properties and elements from the superclasses greatly speeds up execution time, since only a single, non-recursive lookup is required, and the only recursion occurs at import time. Here's a detailed description of what getbaseclasses does: ## v should be a class object. ## Why did I name it 'v'? :( def getbaseclasses(v): ## Have we already set up the _propdict and _elemdict ## for this class object? If so, don't do it again. if not v._propdict: ## This step is required so we get a fresh dictionary on ## this class object, and don't mutate the one on ## ComponentItem or one of our superclasses v._propdict = {} v._elemdict = {} ## Run through all of the strings in _superclassnames ## evaluating them to get a class object. for superclassname in getattr(v, '_superclassnames', []): superclass = eval(superclassname) ## Immediately recurse into getbaseclasses, so that ## the base class _propdict and _elemdict is set up ## properly before we copy it's entries into ours. getbaseclasses(superclass) ## Copy all of the entries from this base class into ## our _propdict and _elemdict so that we get a flat ## dictionary of all of the elements and properties ## that should be available to instances of this class. v._propdict.update(getattr(superclass, '_propdict', {})) v._elemdict.update(getattr(superclass, '_elemdict', {})) ## Finally, copy those properties and elements that ## are defined directly on this class object in ## _privpropdict and _privelemdict into the ## _propdict and _elemdict that ## ComponentItem.__getattr__ looks in. ## Note that if we entered getbaseclasses through the ## recursion above, our subclass will then copy our ## _propdict and _elemdict into it's own after we exit ## the recursion, giving it a copy of all the properties ## and elements defined on the superclass object. v._propdict.update(v._privpropdict) v._elemdict.update(v._privelemdict) ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2003-03-23 23:20 Message: Logged In: YES user_id=45365 That fixed it, with a similar fix for _privpropdict. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-21 23:36 Message: Logged In: YES user_id=111050 I am surprised I didn't have the same problem -- I should have. I suppose that's why I had the if hasattr in the first version of getbaseclasses. Changing if not v._propdict: to if not getattr(v, '_propdict', None): Would probably work. I am going to PyCon next week, so I will have more free time to work on non-directly-work related items. I'll do a fresh checkout and build of framework python, and experiment with the latest gensuitemoudle etc. Donovan ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-21 17:53 Message: Logged In: YES user_id=45365 Donovan, I checked your fixes in, but possibly a bit premature: things broke. For example, running findertools.py as main program (a simple test of the scripting infrastructure) will now fail for me, in getbaseclasses(writing_code). And that seems correct: writing_code is an NProperty, not a ComponentItem. Before you fix things: please check out a fresh tree. I seriously hacked gensuitemodule after applying your mods (it can now run non-interactive on MacOSX). ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-18 19:00 Message: Logged In: YES user_id=111050 Jack, Thanks for taking a look at this. You are correct, if a class has no properties then v._propdict will still be empty, and we will do unneccessary work the next time getbaseclasses is called. I suppose it could be "if not v._propdict and not v._elemdict:" which would reduce the unnecessary work down to when a base class has neither properties nor elements; frankly the if is not really required at all; it was just an attempt to prevent work that has already been performed from being performed again unnecessarily. Suggestions welcome. Re _superclassnames, like everything else done with gensuitemodule, we need to be really careful about circular references, references to things that haven't been defined yet, etc. Everything generated by gensuitemodule is either a ComponentItem or an NProperty, and they don't actually inherit from each other in Python because doing so would be too hairy. So we can't use __bases__ because there is none :-) The thing about _superclassnames is that it's just what it sounds like; a list of strings that indicate superclasses of the current class. By deferring getbaseclasses to import time, we ensure all of the base classes are defined by then. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2003-03-16 23:42 Message: Logged In: YES user_id=45365 Donovan, in as far as I understand the matter (in which area you are clearly my superior:-) I think the idea of the fix is correct, but I have one misgiving: if a class has no properties then v._propdict will still be empty after getbaseclasses(). This will result in the next call of getbaseclasses (if this class is the base class of another) going through the motions again. Is this a problem? Also, do we really need _superclassnames, can't we do this with __bases__? I vaguely remember we went through this issue before, but I can't remember fully... ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-13 01:08 Message: Logged In: YES user_id=111050 Whoops. Have to click the checkbox. ---------------------------------------------------------------------- Comment By: Donovan Preston (dsposx) Date: 2003-03-13 01:08 Message: Logged In: YES user_id=111050 Attaching diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=702620&group_id=5470 From noreply@sourceforge.net Sun Mar 23 22:57:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 14:57:54 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Sun Mar 23 23:11:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 15:11:11 -0800 Subject: [Patches] [ python-Patches-708495 ] OpenVMS complementary patches Message-ID: Patches item #708495, was opened at 2003-03-23 21:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) Assigned to: Martin v. Löwis (loewis) Summary: OpenVMS complementary patches Initial Comment: Explanations of the various patches: fcntlmodule.c Under VMS the third argument is declared as void * expat.h VMS C compiler can optionally mangle name longer than 31 characters, so it not necessary to change long name fileobject.c As the comment indicate this solve a problem into test_fileinput, but I don't understand why... fpectlmodule.c Enable SIGFPE handler import.c Support of VMS filesystem ODS-5 mmapmodule.c VMS need a fsync before a call to fstat to return accurate information myreadline.c Use of vms__StdioReadline posixmodule.c I have move some initialisation part to a specific VMS file, so I have remove it form posixmodule.c pyexpat.c Convert VMS filename to a UNIX style filename. socketmodule.c This patch is the only one which is not delimited by #ifdef __VMS #endif because IMHO it fix a bug into the original code socketmodule.h need to include socket.h and not sys/socket.h sysmodule.c Convert VMS filename to a UNIX style filename. Regards, Jean-François ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-24 00:11 Message: Logged In: YES user_id=21627 Can you please explain the expat.h change? This is an imported source, so I don't want to modify it unless there is a really good reason. The fileobject.c modification needs better analysis. "corrects a test case problem" is not enough reason to make such a change. Does the test case make assumptions that are not supported by the relevant standards? Is there a bug in VMS? etc. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 21:28 Message: Logged In: YES user_id=21627 Can you please combine the patches into a single patch, which can be applied using patch -p0 ??? You can use "diff -ur" or "cvs diff" to create a recursive patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708495&group_id=5470 From noreply@sourceforge.net Mon Mar 24 00:24:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 16:24:34 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Mon Mar 24 02:17:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 18:17:47 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 21:17 Message: Logged In: YES user_id=80475 Great. Will add the dup/pop comment to the final version. I had tried and dropped a couple of other substitutions. Timeit.py showed gains but my real apps were unaffected: build 3 unpack 3 --> rot3 rot2 jmp+1 dup build 4 unpack 4 --> rot4 rot2 rot3 jmp+0 Another desirable substitution was omitted because it needed a NOP if it were going to be implemented with the current, simple approach: unary_not jump_if_false tgt --> jump_if_true tgt ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Mon Mar 24 03:01:40 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 19:01:40 -0800 Subject: [Patches] [ python-Patches-708604 ] unchecked return values - compile.c Message-ID: Patches item #708604, was opened at 2003-03-23 20:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jason Harper (jasonharper) Assigned to: Nobody/Anonymous (nobody) Summary: unchecked return values - compile.c Initial Comment: Various cleanups in Python/compile.c - mainly unchecked return values. Also an unchecked memory allocation in PyList_SetSlice that's called by compile.c. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 From noreply@sourceforge.net Mon Mar 24 03:05:08 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 19:05:08 -0800 Subject: [Patches] [ python-Patches-708604 ] unchecked return values - compile.c Message-ID: Patches item #708604, was opened at 2003-03-23 20:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jason Harper (jasonharper) Assigned to: Nobody/Anonymous (nobody) Summary: unchecked return values - compile.c Initial Comment: Various cleanups in Python/compile.c - mainly unchecked return values. Also an unchecked memory allocation in PyList_SetSlice that's called by compile.c. ---------------------------------------------------------------------- >Comment By: Jason Harper (jasonharper) Date: 2003-03-23 20:05 Message: Logged In: YES user_id=392021 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 From noreply@sourceforge.net Mon Mar 24 03:19:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 19:19:32 -0800 Subject: [Patches] [ python-Patches-708604 ] unchecked return values - compile.c Message-ID: Patches item #708604, was opened at 2003-03-23 20:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jason Harper (jasonharper) Assigned to: Nobody/Anonymous (nobody) Summary: unchecked return values - compile.c Initial Comment: Various cleanups in Python/compile.c - mainly unchecked return values. Also an unchecked memory allocation in PyList_SetSlice that's called by compile.c. ---------------------------------------------------------------------- >Comment By: Jason Harper (jasonharper) Date: 2003-03-23 20:19 Message: Logged In: YES user_id=392021 aaarrrrggghhh.... SF isn't letting me attach the files, clicking Submit simply clears the entered filename??? Will try later from another system. ---------------------------------------------------------------------- Comment By: Jason Harper (jasonharper) Date: 2003-03-23 20:05 Message: Logged In: YES user_id=392021 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 From noreply@sourceforge.net Mon Mar 24 03:18:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 23 Mar 2003 19:18:48 -0800 Subject: [Patches] [ python-Patches-708604 ] unchecked return values - compile.c Message-ID: Patches item #708604, was opened at 2003-03-23 20:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jason Harper (jasonharper) Assigned to: Nobody/Anonymous (nobody) Summary: unchecked return values - compile.c Initial Comment: Various cleanups in Python/compile.c - mainly unchecked return values. Also an unchecked memory allocation in PyList_SetSlice that's called by compile.c. ---------------------------------------------------------------------- >Comment By: Jason Harper (jasonharper) Date: 2003-03-23 20:19 Message: Logged In: YES user_id=392021 aaarrrrggghhh.... SF isn't letting me attach the files, clicking Submit simply clears the entered filename??? Will try later from another system. ---------------------------------------------------------------------- Comment By: Jason Harper (jasonharper) Date: 2003-03-23 20:18 Message: Logged In: YES user_id=392021 aaarrrrggghhh.... SF isn't letting me attach the files, clicking Submit simply clears the entered filename??? Will try later from another system. ---------------------------------------------------------------------- Comment By: Jason Harper (jasonharper) Date: 2003-03-23 20:05 Message: Logged In: YES user_id=392021 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 From noreply@sourceforge.net Mon Mar 24 08:36:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 24 Mar 2003 00:36:28 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 03:36 Message: Logged In: YES user_id=80475 Neal, attached is a revision that puts it all under a single loop. By adding a switch-case, it became more readable and a little faster. Your comment on the extended args worried me, so I now bail-out if any extending is present. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 21:17 Message: Logged In: YES user_id=80475 Great. Will add the dup/pop comment to the final version. I had tried and dropped a couple of other substitutions. Timeit.py showed gains but my real apps were unaffected: build 3 unpack 3 --> rot3 rot2 jmp+1 dup build 4 unpack 4 --> rot4 rot2 rot3 jmp+0 Another desirable substitution was omitted because it needed a NOP if it were going to be implemented with the current, simple approach: unary_not jump_if_false tgt --> jump_if_true tgt ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Mon Mar 24 20:14:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 24 Mar 2003 12:14:59 -0800 Subject: [Patches] [ python-Patches-706590 ] Adds Mock Object support to unittest.TestCase Message-ID: Patches item #706590, was opened at 2003-03-19 22:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Matthew Russell (mattruss) >Assigned to: Steve Purcell (purcell) Summary: Adds Mock Object support to unittest.TestCase Initial Comment: Mock objects can greatly improve unittests (If used in the correct context), especially for code that relis upon resource hungry test (connections to databases, socket servers etc). The module/patch (to unittest) which I am submitting helps to introspect calls to code whilst maintaing transparency and funcionality with your code. I had previously written a similar module for my present employers, and myself and fellow XP partners agree that it has made the XP testing cycle consderably easier. Having googol-ed-out alternatives on the web, I have not found a solution that provides the same level of flexibility. (hope that doesn't sound arrogant) The tests for this module should highlight usage, but i will supply dummy code if this idea is accepted. If unfamiliar with XP/MockObject ideas, please see : http://www.xprogramming.com/xpmag/virtualMockObject s.htm#N78 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 From noreply@sourceforge.net Mon Mar 24 22:47:25 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 24 Mar 2003 14:47:25 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-24 17:47 Message: Logged In: YES user_id=6380 Hmm... How do you know that you aren't optimizing away something that's a jum target? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 03:36 Message: Logged In: YES user_id=80475 Neal, attached is a revision that puts it all under a single loop. By adding a switch-case, it became more readable and a little faster. Your comment on the extended args worried me, so I now bail-out if any extending is present. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 21:17 Message: Logged In: YES user_id=80475 Great. Will add the dup/pop comment to the final version. I had tried and dropped a couple of other substitutions. Timeit.py showed gains but my real apps were unaffected: build 3 unpack 3 --> rot3 rot2 jmp+1 dup build 4 unpack 4 --> rot4 rot2 rot3 jmp+0 Another desirable substitution was omitted because it needed a NOP if it were going to be implemented with the current, simple approach: unary_not jump_if_false tgt --> jump_if_true tgt ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Tue Mar 25 00:24:21 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 24 Mar 2003 16:24:21 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Tim Peters (tim_one) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 19:24 Message: Logged In: YES user_id=80475 In the sequence LOAD_CONST, JUMP_IF_FALSE, POP_TOP, only the first instruction is changed and it is changed to a JUMP+4 which gives the same effect as the whole sequence. If either of the second two codes are jump targets, they will function normally since they are unchanged. In the jump to jump optimization, only the jump target is changed, so it works fine if it is itself a jump target. The sequence BUILD_SEQN, UNPACK_SEQN is replaced by a two instruction block that performs the same function as the original block, so the only remaining case is where the unpack instruction is a jump target. Review of compile's code generator shows no way that the unpack can be jump target if the preceding instruction is a build_seqn. Essentially, the build/unpack pair can only occur in an assignment and there are no possible jumps into the middle of an assignment. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-24 17:47 Message: Logged In: YES user_id=6380 Hmm... How do you know that you aren't optimizing away something that's a jum target? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 03:36 Message: Logged In: YES user_id=80475 Neal, attached is a revision that puts it all under a single loop. By adding a switch-case, it became more readable and a little faster. Your comment on the extended args worried me, so I now bail-out if any extending is present. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 21:17 Message: Logged In: YES user_id=80475 Great. Will add the dup/pop comment to the final version. I had tried and dropped a couple of other substitutions. Timeit.py showed gains but my real apps were unaffected: build 3 unpack 3 --> rot3 rot2 jmp+1 dup build 4 unpack 4 --> rot4 rot2 rot3 jmp+0 Another desirable substitution was omitted because it needed a NOP if it were going to be implemented with the current, simple approach: unary_not jump_if_false tgt --> jump_if_true tgt ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Tue Mar 25 02:55:06 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 24 Mar 2003 18:55:06 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 21:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Nobody/Anonymous (nobody) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Tue Mar 25 12:45:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 25 Mar 2003 04:45:54 -0800 Subject: [Patches] [ python-Patches-706590 ] Adds Mock Object support to unittest.TestCase Message-ID: Patches item #706590, was opened at 2003-03-19 22:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Matthew Russell (mattruss) >Assigned to: Nobody/Anonymous (nobody) Summary: Adds Mock Object support to unittest.TestCase Initial Comment: Mock objects can greatly improve unittests (If used in the correct context), especially for code that relis upon resource hungry test (connections to databases, socket servers etc). The module/patch (to unittest) which I am submitting helps to introspect calls to code whilst maintaing transparency and funcionality with your code. I had previously written a similar module for my present employers, and myself and fellow XP partners agree that it has made the XP testing cycle consderably easier. Having googol-ed-out alternatives on the web, I have not found a solution that provides the same level of flexibility. (hope that doesn't sound arrogant) The tests for this module should highlight usage, but i will supply dummy code if this idea is accepted. If unfamiliar with XP/MockObject ideas, please see : http://www.xprogramming.com/xpmag/virtualMockObject s.htm#N78 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 From noreply@sourceforge.net Tue Mar 25 14:33:10 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 25 Mar 2003 06:33:10 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Raymond Hettinger (rhettinger) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-25 09:33 Message: Logged In: YES user_id=6380 OK, then it's ok with me. I suggest that you put that response into a comment for the edification of future generations. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 19:24 Message: Logged In: YES user_id=80475 In the sequence LOAD_CONST, JUMP_IF_FALSE, POP_TOP, only the first instruction is changed and it is changed to a JUMP+4 which gives the same effect as the whole sequence. If either of the second two codes are jump targets, they will function normally since they are unchanged. In the jump to jump optimization, only the jump target is changed, so it works fine if it is itself a jump target. The sequence BUILD_SEQN, UNPACK_SEQN is replaced by a two instruction block that performs the same function as the original block, so the only remaining case is where the unpack instruction is a jump target. Review of compile's code generator shows no way that the unpack can be jump target if the preceding instruction is a build_seqn. Essentially, the build/unpack pair can only occur in an assignment and there are no possible jumps into the middle of an assignment. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-24 17:47 Message: Logged In: YES user_id=6380 Hmm... How do you know that you aren't optimizing away something that's a jum target? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 03:36 Message: Logged In: YES user_id=80475 Neal, attached is a revision that puts it all under a single loop. By adding a switch-case, it became more readable and a little faster. Your comment on the extended args worried me, so I now bail-out if any extending is present. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 21:17 Message: Logged In: YES user_id=80475 Great. Will add the dup/pop comment to the final version. I had tried and dropped a couple of other substitutions. Timeit.py showed gains but my real apps were unaffected: build 3 unpack 3 --> rot3 rot2 jmp+1 dup build 4 unpack 4 --> rot4 rot2 rot3 jmp+0 Another desirable substitution was omitted because it needed a NOP if it were going to be implemented with the current, simple approach: unary_not jump_if_false tgt --> jump_if_true tgt ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From xzibgle@yandex.ru Tue Mar 25 19:49:56 2003 From: xzibgle@yandex.ru (=?windows-1251?b?1uXt8vAgzuHz9+Xt6P8=?=) Date: Tue, 25 Mar 2003 19:49:56 +0000 Subject: [Patches] =?windows-1251?b?08/QztnFzc3A3yDRyNHSxczAIM3Ay87Dzs7By87Gxc3I3yAgYw==?= Message-ID: <200303251949.GPCHTYLH@yxvninjtkd.com> Ïðèãëàøàåì Âàñ ïðèíÿòü ó÷àñòèå â ñåìèíàðå "Óïðîùåííàÿ ñèñòåìà íàëîãîîáëîæåíèÿ äëÿ ìàëûõ ïðåäïðèÿòèé. Ïî èòîãàì 1-ãî êâàðòàëà ïðèìåíåíèÿ ÓÑÍ" êîíòàêòíûé òåëåôîí 207-26-21 1. Óïðîùåííàÿ ñèñòåìà íàëîãîîáëîæåíèÿ (ÓÑÍ). Íàëîãè è ñáîðû, èñ÷èñëÿåìûå ïðè ÓÑÍ; â êàêèõ ñëó÷àÿõ îðãàíèçàöèè, ïåðåøåäøèå íà ÓÑÍ, ÿâëÿþòñÿ íàëîãîâûìè àãåíòàìè ïî ÍÄÑ, íàëîãó íà ïðèáûëü è äðóãèì íàëîãàì; ïðåèìóùåñòâà è íåäîñòàòêè íîâîé óïðîùåííîé ñèñòåìû íàëîãîîáëîæåíèÿ. 2. Äîõîäû: ñîñòàâ, äîêóìåíòèðîâàíèå, ïðèçíàíèå, îòðàæåíèå â êíèãå è äðóãèõ íàëîãîâûõ ðåãèñòðàõ. Äîõîäû, íå ó÷èòûâàåìûå äëÿ íàëîãîîáëîæåíèÿ. 3. Ðàñõîäû: ñîñòàâ, äîêóìåíòèðîâàíèå, ïðèçíàíèå, îòðàæåíèå â êíèãå è äðóãèõ íàëîãîâûõ ðåãèñòðàõ. 1) ðàñõîäû íà ïðèîáðåòåíèå îñíîâíûõ ñðåäñòâ; 2) ðàñõîäû íà ïðèîáðåòåíèå íåìàòåðèàëüíûõ àêòèâîâ; 3) ðàñõîäû íà ðåìîíò îñíîâíûõ ñðåäñòâ (â òîì ÷èñëå àðåíäîâàííûõ); 4) àðåíäíûå (â òîì ÷èñëå ëèçèíãîâûå) ïëàòåæè çà àðåíäóåìîå (â òîì ÷èñëå ïðèíÿòîå â ëèçèíã) èìóùåñòâî; 5) ìàòåðèàëüíûå ðàñõîäû; 6) ðàñõîäû íà îïëàòó òðóäà, âûïëàòó ïîñîáèé ïî âðåìåííîé íåòðóäîñïîñîáíîñòè â ñîîòâåòñòâèè ñ çàêîíîäàòåëüñòâîì Ðîññèéñêîé Ôåäåðàöèè; 7) ðàñõîäû íà îáÿçàòåëüíîå ñòðàõîâàíèå ðàáîòíèêîâ è èìóùåñòâà; 8) ñóììû íàëîãà íà äîáàâëåííóþ ñòîèìîñòü ïî ïðèîáðåòàåìûì òîâàðàì (ðàáîòàì è óñëóãàì); 9) ïðîöåíòû, óïëà÷èâàåìûå çà ïðåäîñòàâëåíèå â ïîëüçîâàíèå äåíåæíûõ ñðåäñòâ (êðåäèòîâ, çàéìîâ), à òàêæå ðàñõîäû, ñâÿçàííûå ñ îïëàòîé óñëóã, îêàçûâàåìûõ êðåäèòíûìè îðãàíèçàöèÿìè; 10) ðàñõîäû íà îáåñïå÷åíèå ïîæàðíîé áåçîïàñíîñòè íàëîãîïëàòåëüùèêà, ðàñõîäû íà óñëóãè ïî îõðàíå èìóùåñòâà è ò.ï.; 11) ñóììû òàìîæåííûõ ïëàòåæåé; 12) ðàñõîäû íà ñîäåðæàíèå ñëóæåáíîãî òðàíñïîðòà; 13) ðàñõîäû íà êîìàíäèðîâêè, â ÷àñòíîñòè íà: 14) ïëàòà ãîñóäàðñòâåííîìó è (èëè) ÷àñòíîìó íîòàðèóñó çà íîòàðèàëüíîå îôîðìëåíèå äîêóìåíòîâ; 15) ðàñõîäû íà àóäèòîðñêèå óñëóãè; 16) ðàñõîäû íà ïóáëèêàöèþ áóõãàëòåðñêîé îò÷åòíîñòè; 17) ðàñõîäû íà êàíöåëÿðñêèå òîâàðû; 18) ðàñõîäû íà ïî÷òîâûå, òåëåôîííûå, òåëåãðàôíûå è äðóãèå ïîäîáíûå óñëóãè, ðàñõîäû íà îïëàòó óñëóã ñâÿçè; 19) ðàñõîäû, ñâÿçàííûå ñ ïðèîáðåòåíèåì ïðàâà íà èñïîëüçîâàíèå ïðîãðàìì äëÿ ÝÂÌ è áàç äàííûõ; 20) ðàñõîäû íà ðåêëàìó; 21) ðàñõîäû íà ïîäãîòîâêó è îñâîåíèå íîâûõ ïðîèçâîäñòâ, öåõîâ è àãðåãàòîâ; 22) óïëà÷åííûå ñóììû íàëîãîâ è ñáîðîâ; 23) ðàñõîäû ïî îïëàòå ñòîèìîñòè òîâàðîâ, ïðèîáðåòåííûõ äëÿ äàëüíåéøåé ðåàëèçàöèè. Ðàñõîäû, íå ó÷èòûâàåìûå äëÿ íàëîãîîáëîæåíèÿ. 5. Ïîðÿäîê ðàñ÷åòà íàëîãà è åãî çà÷èñëåíèÿ â áþäæåòû è âíåáþäæåòíûå ôîíäû. 6. Îò÷åòíîñòü îðãàíèçàöèè, ïðèìåíÿþùåé ÓÑÍ. Ïîðÿäîê çàïîëíåíèÿ è ñðîêè ïîäà÷è íàëîãîâîé äåêëàðàöèè. Îò÷åòíîñòü ïî äðóãèì íàëîãàì, â òîì ÷èñëå ïî òåì, ïëàòåëüùèêàìè êîòîðûõ íå ÿâëÿþòñÿ îðãàíèçàöèè, ïðèìåíÿþùèå ÓÑÍ - êîãäà è â êàêîì îáúåìå åå ïîäàâàòü. 7. Îñîáåííîñòè òðóäîâûõ îòíîøåíèé ñ ðàáîòíèêàìè â îðãàíèçàöèè, ïðèìåíÿþùåé ÓÑÍ. Óïëàòà íàëîãà íà äîõîäû ôèçè÷åñêèõ ëèö è âçíîñîâ íà îáÿçàòåëüíîå ïåíñèîííîå ñòðàõîâàíèå. Îáåñïå÷åíèå ðàáîòíèêîâ ïîñîáèÿìè ïî âðåìåííîé íåòðóäîñïîñîáíîñòè. 8. Ñëîæíûå âîïðîñû ïðèìåíåíèÿ ÓÑÍ, â òîì ÷èñëå ïðîáëåìû ïåðåõîäà íà ÓÑÍ; ñïèñàíèÿ ÍÄÑ, îòðàæåííîãî íà ñ÷åòå 19 íà 1.01.03; ïðèçíàíèÿ ðàñõîäîâ íà ïðèîáðåòåíèå îñíîâíûõ ñðåäñòâ; ëèìèòèðóåìûå âèäû ðàñõîäîâ è ò.ä. 9. Îòâåòû íà âîïðîñû. Ñåìèíàð ñîñòîèòñÿ 8 àïðåëÿ (ì.Àêàäåìè÷åñêàÿ), 3 ìèíóòû îò ìåòðî. Ïðè ó÷àñòèè îäíîãî ÷åëîâåêà ñòîèìîñòü ëþáîãî ñåìèíàðà ñîñòàâëÿåò 3 900 ðóá., ñ ó÷åòîì ÍÄÑ.  ñòîèìîñòü ñåìèíàðà âêëþ÷åíû êîôå-ïàóçà, îáåä â ðåñòîðàíå è ðàçäàòî÷íûé ìàòåðèàë. Íà ñåìèíàðå âûñòóïàåò ÊËÈÌÎÂÀ Ìàðèíà Àðêàäüåâíà, ê.ý.í., ïðîôåññèîíàëüíûé áóõãàëòåð, àâòîð êíèã "Íàëîã íà äîõîäû ôèçè÷åñêèõ ëèö", "Êàê ïðàâèëüíî îôîðìèòü ó÷åòíóþ ïîëèòèêó îðãàíèçàöèè", "Äîêóìåíòîîáîðîò â áóõãàëòåðñêîì ó÷åòå", "Òðóäîâîé Êîäåêñ ÐÔ: ïîñòàòåéíûé êîììåíòàðèé", "Çàðàáîòíàÿ ïëàòà: ïðàêòè÷åñêîå ðóêîâîäñòâî", "Óïðîùåííàÿ ñèñòåìà íàëîãîîáëîæåíèÿ" è ìíîãèõ äðóãèõ. Îðãàíèçàòîð ñåìèíàðîâ Ìåæäóíàðîäíûé öåíòð îáó÷åíèÿ. Êîíòàêòíûé òåëåôîí 207-26-21 sdlsbzvjrg From noreply@sourceforge.net Tue Mar 25 23:15:08 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 25 Mar 2003 15:15:08 -0800 Subject: [Patches] [ python-Patches-709743 ] os.setpgrp function failed to build Message-ID: Patches item #709743, was opened at 2003-03-25 16:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gary H. Loechelt (loechelt) Assigned to: Nobody/Anonymous (nobody) Summary: os.setpgrp function failed to build Initial Comment: The os.setpgrp function failed to build on HP-UX B.10.20 for Python 2.3a2. Comparing the build with Python 2.2.1, I noticed a missing line in the pyconfig.h.in file. I added the appropriate line to the file and rebuilt the executable. Note that I did NOT check the configure script to insure that the appropriate compiler macro (HAVE_SETPGRP) was set. I just manually set the macro in the pyconfig.h file directly. The person who has responsibility for configure should probably check it as well to make sure that it is not broken as well. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 From noreply@sourceforge.net Tue Mar 25 23:16:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 25 Mar 2003 15:16:09 -0800 Subject: [Patches] [ python-Patches-709744 ] CALL_ATTR opcode Message-ID: Patches item #709744, was opened at 2003-03-26 00:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709744&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Wouters (twouters) Assigned to: Nobody/Anonymous (nobody) Summary: CALL_ATTR opcode Initial Comment: The result of the PyCore sprint of me and Brett: the CALL_ATTR opcode (LOAD_ATTR and CALL_FUNCTION combined) that skips the PyMethod creation and destruction for classic classes (but not newstyle classes, yet.) The code is somewhat rough yet, it needs commenting, some renaming, and most importantly testing. It seems to work, however, and provides between a 35% and 5% speedup. (5% in 'average' code, up to 35% in instance method calls and instance creation alone.) It also needs to be updated to include newstyle classes. I will likely work on this on the flight home. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709744&group_id=5470 From noreply@sourceforge.net Tue Mar 25 23:18:01 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 25 Mar 2003 15:18:01 -0800 Subject: [Patches] [ python-Patches-709744 ] CALL_ATTR opcode Message-ID: Patches item #709744, was opened at 2003-03-26 00:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709744&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Wouters (twouters) Assigned to: Nobody/Anonymous (nobody) Summary: CALL_ATTR opcode Initial Comment: The result of the PyCore sprint of me and Brett: the CALL_ATTR opcode (LOAD_ATTR and CALL_FUNCTION combined) that skips the PyMethod creation and destruction for classic classes (but not newstyle classes, yet.) The code is somewhat rough yet, it needs commenting, some renaming, and most importantly testing. It seems to work, however, and provides between a 35% and 5% speedup. (5% in 'average' code, up to 35% in instance method calls and instance creation alone.) It also needs to be updated to include newstyle classes. I will likely work on this on the flight home. ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2003-03-26 00:18 Message: Logged In: YES user_id=34209 attaching patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709744&group_id=5470 From noreply@sourceforge.net Wed Mar 26 01:30:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Tue, 25 Mar 2003 17:30:47 -0800 Subject: [Patches] [ python-Patches-707257 ] Improve code generation Message-ID: Patches item #707257, was opened at 2003-03-20 20:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) Summary: Improve code generation Initial Comment: Adds a single function to improve generated bytecode. Has a two line attachment point, so it is completely de-coupled from both the compiler and ceval.c. The first pass looks for the sequence LOAD_CONST 1, JUMP_IF_FALSE xx, POP_TOP. It replaces the first instruction with JUMP_FORWARD +4. The second pass looks for jumps to an unconditional jump. The first jump target is replaced with the second jump target. Both are safe, general purpose optimizations. Together, they eliminate 100% of the "while 1" loop overhead. The structure of the code allows for other code improvements to be easily added. This one focuses on low hanging fruit. It takes a simple, safe approach that does not change bytecode size or order and does not need a basic block analysis. Improves timings on pybench, pystone, and two of my real applications. timeit.py shows dramatic improvement to code using "while 1". python timeit.py "while 1: break" python timeit.py -s "i=0" "while 1:" " if i==1: break" " else: i=1" ----- Example ----- Disassembly of def f(x): while 1: x -= 1 if x == 0: break shows two lines changing from: 3 LOAD_CONST 1 (1) 38 JUMP_ABSOLUTE 3 and improving to: 3 JUMP_FORWARD 4 (to 10) 38 JUMP_ABSOLUTE 10 All of the other lines are left unchanged. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-25 20:30 Message: Logged In: YES user_id=80475 Added the clarifying comments. Loaded patch as: Python/compile.c 2.277 Closing patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-25 09:33 Message: Logged In: YES user_id=6380 OK, then it's ok with me. I suggest that you put that response into a comment for the edification of future generations. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 19:24 Message: Logged In: YES user_id=80475 In the sequence LOAD_CONST, JUMP_IF_FALSE, POP_TOP, only the first instruction is changed and it is changed to a JUMP+4 which gives the same effect as the whole sequence. If either of the second two codes are jump targets, they will function normally since they are unchanged. In the jump to jump optimization, only the jump target is changed, so it works fine if it is itself a jump target. The sequence BUILD_SEQN, UNPACK_SEQN is replaced by a two instruction block that performs the same function as the original block, so the only remaining case is where the unpack instruction is a jump target. Review of compile's code generator shows no way that the unpack can be jump target if the preceding instruction is a build_seqn. Essentially, the build/unpack pair can only occur in an assignment and there are no possible jumps into the middle of an assignment. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-24 17:47 Message: Logged In: YES user_id=6380 Hmm... How do you know that you aren't optimizing away something that's a jum target? ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-24 03:36 Message: Logged In: YES user_id=80475 Neal, attached is a revision that puts it all under a single loop. By adding a switch-case, it became more readable and a little faster. Your comment on the extended args worried me, so I now bail-out if any extending is present. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 21:17 Message: Logged In: YES user_id=80475 Great. Will add the dup/pop comment to the final version. I had tried and dropped a couple of other substitutions. Timeit.py showed gains but my real apps were unaffected: build 3 unpack 3 --> rot3 rot2 jmp+1 dup build 4 unpack 4 --> rot4 rot2 rot3 jmp+0 Another desirable substitution was omitted because it needed a NOP if it were going to be implemented with the current, simple approach: unary_not jump_if_false tgt --> jump_if_true tgt ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 19:24 Message: Logged In: YES user_id=33168 I need to think if there's any way to break the EXTENDED_ARG with the way you did it (checking backwards, vs skipping it by incrementing over). I think it's ok and I have no other issues with the patch. Aftering thinking about the DUP_TOP, POP_TOP, it's not a big deal, but probably a comment should be added indicating why you do the JUMP+2, DUP, POP. Couldn't you also implement ROT_THREE and ROT_FOUR pretty easily? Not sure if it's worth it, though. You are correct about the unconditional jump test, I didn't notice the [tgt] vs [i]. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-23 17:57 Message: Logged In: YES user_id=80475 Thanks for the thorough code review and for being positive on the inclusion of the patch. Attached is a revision that delays PyString_Size and bypasses situations with extended arguments. For the dead code fragment, I'm more comfortable with the DUP_TOP POP_TOP than use of STOP_CODE but it is probably a matter of taste. A more sophisticated approach would not have any dead code but I've aimed for the simplest thing that could possibly work. The unconditional jump test is performed on a different opcode than the test for equality to JUMP_ABSOLUTE, so the two tests cannot be combined. The first operates on codestr[tgt] and the second on codestr[i]. I had tried a single big loop instead of three little loops but there was a loss of clarity. Since the recognizers quickly skip over mismatches, the total loop time is very small. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:23 Message: Logged In: YES user_id=33168 Generally I think I'd like to see only one loop over the code (should scale better than having N loops--1 per optimization). Perhaps making each optimization into it's own function--e.g., opt_while_1, opt_swap, opt_jump_jump. * In optimize_code, PyString_Size() is called before verifying code is a string. If the code isn't a string, an exception will be left-over. Suggest setting clen after the string check. * I don't think the code works with EXTENDED_ARGS. This can happen if there are more than 64k variables etc. Perhaps if you get an EXTENDED_ARG you should just bail? * the DUP_TOP and POP_TOP are never supposed to be executed, right? I would use STOP_CODE to indicate the ops were invalid. I can also see where others would find this suggestion objectionable. There is no NOP though. Ideally, we would remove the dead code, rather than have the JUMP, etc. This would mean possibly changing all subsequent JUMP_ABSOLUTEs though. I don't recommend changing this, just lamenting. (I particularly like the BUILD/UNPACK of 2 becoming a ROT_TWO, BTW :-) * Why in the jumps to jumps loop don't you set codestr[i] = opcode if opcode == JUMP_FORWARD, then do away with the if (opcode != JUMP_ABSOLUTE)? The check for UNCONDITIONAL_JUMP already guarantees you have either JUMP_FORWARD or JUMP_ABSOLUTE. * same problem with EXTENDED_ARG for SETARG though. You probably need a check before the SETARG to make sure tgttgt < 64k. Other than the EXTENDED_ARG and string size issues, the code looks fine and makes sense. In general, I'm positive on the idea of doing this. However, I'm not sure this change is appropriate for 2.3, partially because the beta is coming. I'm a little (very little) concerned the speed penalty for compiling. I realize this is a one-time (at most) cost, so it's almost definitely insignificant. I'd like Tim or Guido to approve the approach for acceptance. Assigning to Tim. Regardless of whether this patch is accepted for 2.3, I think all of these should be implemented in 2.4! Hopefully at that time there will be the new AST compiler which we can modify more easily and make even more optimizations. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 18:50 Message: Logged In: YES user_id=357491 Ah, forgot about the planned refactoring for 2.4. Oops. =) OK, I will keep this in the back of my head until the refactor gets done. And in case it wasn't clear, I am all for getting this patch in. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 18:04 Message: Logged In: YES user_id=80475 Not really. There is no need to go wild before the compiler is refactored. Loading another update that includes theller's idea to handle all constants evaluating to true. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 17:43 Message: Logged In: YES user_id=357491 Do I hear a PEP coming? =) If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?). I am up to writing a PEP and trying to get this to work. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 15:28 Message: Logged In: YES user_id=80475 Right, it takes a LOAD_GLOBAL to fetch True using a dictionary lookup. In constast, 2 is quickly fetched with LOAD_CONST. Adding a hook is easy enough, but I'll leave that for another day (I've already exceeded my quota of API change requests). This patch focuses on "the simplest thing that could possibly work". ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 15:09 Message: Logged In: YES user_id=11105 Looks better now. So it seems 'while True:' or 'while 2:' is worse than 'while 1:' ;-) ? I like Brett's suggestion about adding an (additional) hook here which allows to pass the code to Python (?) code for further peephole optimizing. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-21 14:18 Message: Logged In: YES user_id=80475 Attached a revised patch: * Adds PyMem_Free (theller's review comment) * Applies macro form of string/tuple operations * All exits now return a new reference * Attach point is now a single line Walter, until GvR moves to prevent shadowing of globals, it would be unsafe to optimize "while True". ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-21 03:21 Message: Logged In: YES user_id=89016 "while True:" should be optimized too. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2003-03-21 03:02 Message: Logged In: YES user_id=11105 Isn't there a PyMem_Free missing at the end? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-21 02:43 Message: Logged In: YES user_id=357491 OK, fair enough. I buy the argument. =) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-20 21:11 Message: Logged In: YES user_id=80475 The -O option was useful when the optimization involved a trade-off. It used to be that you lost line numbering when - O was turned on. In contrast, this patch is a pure win and does not affect anything else including dis and pdb. Other bytecode optimizations have been implemented directly in the compiler code (for instance, negatives before a constant) and those were not linked to the -O option. IOW, I recommend against attaching this to a command line switch. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2003-03-20 20:56 Message: Logged In: YES user_id=357491 Perhaps this should be made something that is done with the -O option? Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one. Love the idea, though. Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations. This patch seems to be taking the initial steps toward a setup like that. Besides, the poor -O option isn't worth much of anything these days thanks to Michael. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470 From noreply@sourceforge.net Wed Mar 26 16:08:40 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 26 Mar 2003 08:08:40 -0800 Subject: [Patches] [ python-Patches-710127 ] Make "%c" % u"a" work Message-ID: Patches item #710127, was opened at 2003-03-26 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Make "%c" % u"a" work Initial Comment: Currently "%c" % u"a" fails, while "%s" % u"a" works. This patch fixes this problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 From noreply@sourceforge.net Wed Mar 26 16:09:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Wed, 26 Mar 2003 08:09:09 -0800 Subject: [Patches] [ python-Patches-710127 ] Make "%c" % u"a" work Message-ID: Patches item #710127, was opened at 2003-03-26 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) >Summary: Make "%c" % u"a" work Initial Comment: Currently "%c" % u"a" fails, while "%s" % u"a" works. This patch fixes this problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 From noreply@sourceforge.net Thu Mar 27 08:09:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 00:09:47 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 09:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Nobody/Anonymous (nobody) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Thu Mar 27 08:25:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 00:25:49 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 09:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Nobody/Anonymous (nobody) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: Geert Jansen (geertj) Date: 2003-03-27 09:25 Message: Logged In: YES user_id=537938 Here is the patch. It is tested and verified to fix the problem by two people. I also verified that it passes the test suite. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Thu Mar 27 19:31:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 11:31:49 -0800 Subject: [Patches] [ python-Patches-710931 ] iconv codec-NG and Korean Codecs Message-ID: Patches item #710931, was opened at 2003-03-28 04:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Nobody/Anonymous (nobody) Summary: iconv codec-NG and Korean Codecs Initial Comment: This patch includes update for iconv_codec, new sources for korean codecs and MultibyteCodec supplemental library. I splitted out common parts of codecs for usual multibyte encodings into multibytecodec.c and this iconv codec and the korean codecs are using it. The korean codecs is only 58K in stripped i386 ELF and 62K in stripped i386 PECOFF binary and I think it's small enough to be incorporated into python. Files: Lib/encodings/aliases.py adds aliases for korean encodings and remove comments that isn't true now. Lib/encodings/cp949.py Lib/encodings/euc_kr.py codecs for korean encodings Lib/encodings/iconv_codec.py updated for new _iconv_codec implementation Lib/test/test_ko_codecs.py unit test for cp949, euc_kr codec Lib/test/test_ko_codecs_mapping.py unit test to test cp949 mapping Lib/test/test_iconv_codec_euc_kr.py another iconv_codec test unit. because non-unicode multibyte encoding is required to test both of iconv_codec and multibytecodec. Lib/test/test_multibytecodec_support.py common part for above test units Modules/_iconv_codec.c new implementation of _iconv_codec. this resolves numerous problems that previous implementation had. and iconv_codec has sane StreamReader now! :) Modules/_ko_codec.c Modules/_ko_codec.h korean codecs module Modules/multibytecodec.c Modules/multibytecodec.h common multibyte codec supplement. I think that this can be used for any usual multibyte encodings. I'll submit Chinese Codecs in few days using this. Tools/unicode/genmap_ko_codecs.py code generator for _ko_codecs.h ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 From noreply@sourceforge.net Thu Mar 27 20:12:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 12:12:54 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-20 15:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Stuart Bishop (zenzen) Assigned to: Neal Norwitz (nnorwitz) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- >Comment By: Stuart Bishop (zenzen) Date: 2003-03-28 07:12 Message: Logged In: YES user_id=46639 tzset3.diff is an updated diff against the CVS head. Fixes: -Don't test time.altzone for UTC - non-DST means altzone is undefined -Make sure dst timezone name is not the same as non-dst timezone name in TZ environment variable, to work around an apparent Solaris bug. -Extraneous cruft removed from test_time.py and configure.in - no more irrelevant comments. -More whitespace as per Tim's comments comments. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-22 08:28 Message: Logged In: YES user_id=33168 After patching, the test fails: File "/home/neal/build/python/2_3/Lib/test/test_time.py", line 115, in test_tzset self.failUnlessEqual(time.daylight,1) File "/home/neal/build/python/2.3/Lib/unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: 0 != 1 Also, why is the code commented out (via a string) on lines 120-144? Should these be removed? I see the comment about wallclock time, but don't understand why the code should be left in if we can't test it. I can understand a comment describing generally the issue. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-21 12:18 Message: Logged In: YES user_id=33168 I'll try to get to this soon. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-21 12:11 Message: Logged In: YES user_id=6380 Unassigning, as I won't hve time for this. But it is important - someone else should make sure this goes into 2.3b1! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-21 08:50 Message: Logged In: YES user_id=31435 Assigned to Guido, as I can't test it. Two notes: 1. Leaving commented-out code in config and the test suite doesn't appear to serve a purpose, although it will serve to confuse future readers ("why is this here? why is it commented out?"). 2. The Python style guide asks for a blank after commas in argument lists and tuples. We're not really in danger of stretching the screen here . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Thu Mar 27 21:09:20 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 13:09:20 -0800 Subject: [Patches] [ python-Patches-711002 ] new test_urllib and patch for found urllib bug Message-ID: Patches item #711002, was opened at 2003-03-27 13:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711002&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: new test_urllib and patch for found urllib bug Initial Comment: Free time at PyCon led to me writing a new test_urllib (happy, Raymond? =). Since I have no guarantee that there would be a net connection (and didn't want to use it without user permission since I view using the 'network' resource as using sockets and not the Net) I wrote all tests using temporary files. And do this found a bug, sort of. The docs and doc string for urlretrieve() says the second value from the returned tuple should be None when a local file is passed as an argument. Well, it wasn't; it was returning an rfc2822.Message object like it does for remote files. So I patched it to match the docs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711002&group_id=5470 From noreply@sourceforge.net Thu Mar 27 23:31:59 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 15:31:59 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 12:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Martin v. Löwis (loewis) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- Comment By: Jeremy Moore (jmoore_calaway) Date: 2003-03-27 16:31 Message: Logged In: YES user_id=744000 (Apologies if this is the inappropriate place to ask) I'm porting an app to Mac OS X 10.2 (begrudgingly) and ran straight into this bug. Nothing like changing versions of python (2.2.2 to 2.3a2) and tcl/tk (8.3.4 to 8.4.2) while using a platform you're unfamiliar with! Anyway, I have successflly applied the patch; however, it has simply propagated the problem elsewhere. Specifically, the pmw rev 1.1 widgets library. The problem is, pmw does additional processing that chokes on the '??' now returned by the try: excempt: statements. Perhaps, if anyone knows, it would be better to mimick what tcl/tk 8.3.x returned with the except statements. Pmw may not be the only library out there that will get choked up on this. I will submit a bug in the pmw for this as well, but I'm looking for a least resistance path to get things up and running. (And not really wanting to rewite all my GUI constructon code...) Thanks Jeremy Moore ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-23 06:35 Message: Logged In: YES user_id=60903 > What is the problem that this patch solves? As the subject says: Provide a patch for #698517. tk8.4.2 returns for the undefined fields in events empty strings or '??' strings, on which the int conversions fail. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 05:07 Message: Logged In: YES user_id=21627 What is the problem that this patch solves? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-22 00:26 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 15:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 15:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 14:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 13:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Thu Mar 27 23:58:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 15:58:14 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 20:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Martin v. Löwis (loewis) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 00:58 Message: Logged In: YES user_id=21627 Well, no. The Tk change was made for a reason, and it is unlikely that Tk people will back it out, so we should not bypass this change. If you want to get up and running, I recommend to use Tcl 8.3. ---------------------------------------------------------------------- Comment By: Jeremy Moore (jmoore_calaway) Date: 2003-03-28 00:31 Message: Logged In: YES user_id=744000 (Apologies if this is the inappropriate place to ask) I'm porting an app to Mac OS X 10.2 (begrudgingly) and ran straight into this bug. Nothing like changing versions of python (2.2.2 to 2.3a2) and tcl/tk (8.3.4 to 8.4.2) while using a platform you're unfamiliar with! Anyway, I have successflly applied the patch; however, it has simply propagated the problem elsewhere. Specifically, the pmw rev 1.1 widgets library. The problem is, pmw does additional processing that chokes on the '??' now returned by the try: excempt: statements. Perhaps, if anyone knows, it would be better to mimick what tcl/tk 8.3.x returned with the except statements. Pmw may not be the only library out there that will get choked up on this. I will submit a bug in the pmw for this as well, but I'm looking for a least resistance path to get things up and running. (And not really wanting to rewite all my GUI constructon code...) Thanks Jeremy Moore ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-23 14:35 Message: Logged In: YES user_id=60903 > What is the problem that this patch solves? As the subject says: Provide a patch for #698517. tk8.4.2 returns for the undefined fields in events empty strings or '??' strings, on which the int conversions fail. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 13:07 Message: Logged In: YES user_id=21627 What is the problem that this patch solves? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-22 08:26 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 23:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 23:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 22:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 21:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Thu Mar 27 23:59:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 15:59:39 -0800 Subject: [Patches] [ python-Patches-710931 ] iconv codec-NG and Korean Codecs Message-ID: Patches item #710931, was opened at 2003-03-27 20:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) >Assigned to: Martin v. Löwis (loewis) Summary: iconv codec-NG and Korean Codecs Initial Comment: This patch includes update for iconv_codec, new sources for korean codecs and MultibyteCodec supplemental library. I splitted out common parts of codecs for usual multibyte encodings into multibytecodec.c and this iconv codec and the korean codecs are using it. The korean codecs is only 58K in stripped i386 ELF and 62K in stripped i386 PECOFF binary and I think it's small enough to be incorporated into python. Files: Lib/encodings/aliases.py adds aliases for korean encodings and remove comments that isn't true now. Lib/encodings/cp949.py Lib/encodings/euc_kr.py codecs for korean encodings Lib/encodings/iconv_codec.py updated for new _iconv_codec implementation Lib/test/test_ko_codecs.py unit test for cp949, euc_kr codec Lib/test/test_ko_codecs_mapping.py unit test to test cp949 mapping Lib/test/test_iconv_codec_euc_kr.py another iconv_codec test unit. because non-unicode multibyte encoding is required to test both of iconv_codec and multibytecodec. Lib/test/test_multibytecodec_support.py common part for above test units Modules/_iconv_codec.c new implementation of _iconv_codec. this resolves numerous problems that previous implementation had. and iconv_codec has sane StreamReader now! :) Modules/_ko_codec.c Modules/_ko_codec.h korean codecs module Modules/multibytecodec.c Modules/multibytecodec.h common multibyte codec supplement. I think that this can be used for any usual multibyte encodings. I'll submit Chinese Codecs in few days using this. Tools/unicode/genmap_ko_codecs.py code generator for _ko_codecs.h ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 From noreply@sourceforge.net Fri Mar 28 00:00:25 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 16:00:25 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 09:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) >Assigned to: M.-A. Lemburg (lemburg) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:00 Message: Logged In: YES user_id=21627 Marc-Andre, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2003-03-27 09:25 Message: Logged In: YES user_id=537938 Here is the patch. It is tested and verified to fix the problem by two people. I also verified that it passes the test suite. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Fri Mar 28 00:01:01 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 16:01:01 -0800 Subject: [Patches] [ python-Patches-710127 ] Make "%c" % u"a" work Message-ID: Patches item #710127, was opened at 2003-03-26 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) >Assigned to: Martin v. Löwis (loewis) >Summary: Make "%c" % u"a" work Initial Comment: Currently "%c" % u"a" fails, while "%s" % u"a" works. This patch fixes this problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 From noreply@sourceforge.net Fri Mar 28 00:02:23 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 16:02:23 -0800 Subject: [Patches] [ python-Patches-709743 ] os.setpgrp function failed to build Message-ID: Patches item #709743, was opened at 2003-03-26 00:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gary H. Loechelt (loechelt) >Assigned to: Martin v. Löwis (loewis) Summary: os.setpgrp function failed to build Initial Comment: The os.setpgrp function failed to build on HP-UX B.10.20 for Python 2.3a2. Comparing the build with Python 2.2.1, I noticed a missing line in the pyconfig.h.in file. I added the appropriate line to the file and rebuilt the executable. Note that I did NOT check the configure script to insure that the appropriate compiler macro (HAVE_SETPGRP) was set. I just manually set the macro in the pyconfig.h file directly. The person who has responsibility for configure should probably check it as well to make sure that it is not broken as well. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:01 Message: Logged In: YES user_id=21627 Can you please report precisely as to how it fails? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 From noreply@sourceforge.net Fri Mar 28 00:01:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 16:01:58 -0800 Subject: [Patches] [ python-Patches-709743 ] os.setpgrp function failed to build Message-ID: Patches item #709743, was opened at 2003-03-26 00:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gary H. Loechelt (loechelt) Assigned to: Nobody/Anonymous (nobody) Summary: os.setpgrp function failed to build Initial Comment: The os.setpgrp function failed to build on HP-UX B.10.20 for Python 2.3a2. Comparing the build with Python 2.2.1, I noticed a missing line in the pyconfig.h.in file. I added the appropriate line to the file and rebuilt the executable. Note that I did NOT check the configure script to insure that the appropriate compiler macro (HAVE_SETPGRP) was set. I just manually set the macro in the pyconfig.h file directly. The person who has responsibility for configure should probably check it as well to make sure that it is not broken as well. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:01 Message: Logged In: YES user_id=21627 Can you please report precisely as to how it fails? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 From noreply@sourceforge.net Fri Mar 28 00:03:32 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 16:03:32 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-25 03:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) >Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Fri Mar 28 00:12:52 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Thu, 27 Mar 2003 16:12:52 -0800 Subject: [Patches] [ python-Patches-708374 ] add offset to mmap Message-ID: Patches item #708374, was opened at 2003-03-23 15:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add offset to mmap Initial Comment: This patch is from Yotam Medini sent to me in mail. It adds support for the offset parameter to mmap. It ignores the check for mmap size "if the file is character device. Some device drivers (which I happen to use) have zero size in fstat buffer, but still one can seek() read() and tell()." I added minimal doc and tests. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:12 Message: Logged In: YES user_id=21627 I think non-zero offsets need to be supported for Windows as well. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 16:37 Message: Logged In: YES user_id=33168 Email received from Yotam: I have downloaded and patched the 2.3a source. compiled locally just this module, and it worked fine for my application (with offset for character device file) I did not run the released test though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 From noreply@sourceforge.net Fri Mar 28 08:40:20 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 00:40:20 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 09:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) >Assigned to: Martin v. Löwis (loewis) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-28 09:40 Message: Logged In: YES user_id=38388 Looks ok. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:00 Message: Logged In: YES user_id=21627 Marc-Andre, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2003-03-27 09:25 Message: Logged In: YES user_id=537938 Here is the patch. It is tested and verified to fix the problem by two people. I also verified that it passes the test suite. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Fri Mar 28 08:44:42 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 00:44:42 -0800 Subject: [Patches] [ python-Patches-612627 ] Allow more Unicode on sys.stdout Message-ID: Patches item #612627, was opened at 2002-09-21 22:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=612627&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Allow more Unicode on sys.stdout Initial Comment: This patch extends the set of Unicode strings that can be printed to sys.stdout, to support all strings that the terminal will likely support. It also adds an encoding attribute to sys.std{in,out}. To do that: - it adds a .encoding attribute to all file objects, which is normally None - initializes the encoding of sys.stdin and sys.stdout if either is a terminal. - adds a wrapper object around sys.stdout in site.py that encodes all Unicode objects according to the detected encoding, if that encoding is known to Python To find the encoding of the terminal, it - uses GetConsoleCP and GetConsoleOutputCP on Windows, - uses nl_langinfo(CODESET) on Unix, if available. The primary rationale for this change is that people should be able to print Unicode in an interactive session. A parallel change needs to be added for IDLE, so that it adds the .encoding attribute to the emulated stdout (it already supports printing of Unicode on stdout). ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-28 09:44 Message: Logged In: YES user_id=38388 Looks ok except for the direct hacking of f_encoding in the sys module. Please add either a macro or a new API to make changing the encoding from C possible without tapping directly into the implementation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 12:59 Message: Logged In: YES user_id=21627 Is the patch now acceptable? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-26 19:47 Message: Logged In: YES user_id=21627 I've attached a revised version which implements your proposal; this version works without modification of site.py. In its current form, the file encoding is only applied in print; for sys.stdout.write, it is ignored. For print, it is applied independent of whether this is a script or interactive mode. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-10-25 14:09 Message: Logged In: YES user_id=38388 I think it could work by adding a special case to PyFile_WriteObject() instead of calling PyObject_Print(). You first encode the Unicode object and then let PyFile_WriteString() take care of the writing to the FILE* object. I see no other way, since you can't place the .encoding information into the FILE* object. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-24 11:02 Message: Logged In: YES user_id=21627 I have considered implementing it in the file object. However, it becomes quite involved, and heavy C code: PyFile_WriteObject calls PyObject_Print. Since Unicode does not implement a tp_print, this calls str/repr, which converts using the default encoding. It is not clear at which point the file encoding should be taking into account. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2002-09-24 10:10 Message: Logged In: NO I like the .encoding concept. I don't really like the sys.stdout wrapper. Wouldn't it be better to add the functionality to the file object .write() and .writelines() methods and then only use the wrapper in case sys.stdout is not a true file object ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=612627&group_id=5470 From noreply@sourceforge.net Fri Mar 28 15:34:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 07:34:14 -0800 Subject: [Patches] [ python-Patches-709743 ] os.setpgrp function failed to build Message-ID: Patches item #709743, was opened at 2003-03-25 16:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gary H. Loechelt (loechelt) Assigned to: Martin v. Löwis (loewis) Summary: os.setpgrp function failed to build Initial Comment: The os.setpgrp function failed to build on HP-UX B.10.20 for Python 2.3a2. Comparing the build with Python 2.2.1, I noticed a missing line in the pyconfig.h.in file. I added the appropriate line to the file and rebuilt the executable. Note that I did NOT check the configure script to insure that the appropriate compiler macro (HAVE_SETPGRP) was set. I just manually set the macro in the pyconfig.h file directly. The person who has responsibility for configure should probably check it as well to make sure that it is not broken as well. ---------------------------------------------------------------------- >Comment By: Gary H. Loechelt (loechelt) Date: 2003-03-28 08:34 Message: Logged In: YES user_id=142817 The build failed because the HAVE_SETPGRP compiler macro was never set. Consequently, the code for the os.setpgrp function (posix_setpgrp) in the posixmodule.c file never compiled. Even though the rest of the posixmodule.c compiled, the os.setpgrp function was not available in the os module. Once I manually set the HAVE_SETPGRP compiler macro in the pyconfig.h header file and rebuilt posixmodule.c, everything worked and I was able to call the os.setpgrp function. I began to track down why the HAVE_SETPGRP compiler macro never got set during my configuration. Realizing that pyconfig.h is generated from pyconfig.h.in, I checked to see if HAVE_SETPGRP was even in pyconfig.h.in to start with. It was not. I compared pyconfig.h.in in python version 2.3a2 with version 2.2.1 and confirmed that HAVE_SETPGRP is indeed missing from pyconfig.h.in. Consequently, it never gets passed on to pyconfig.h during configuration, and posix_setpgrp never gets compiled in posixmodule.c because the macro is never defined. That was why I could not import the setpgrp function from the os module in my build of python 2.3a2, even though the rest of the os module was fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 17:01 Message: Logged In: YES user_id=21627 Can you please report precisely as to how it fails? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 From noreply@sourceforge.net Fri Mar 28 17:12:49 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 09:12:49 -0800 Subject: [Patches] [ python-Patches-711448 ] Warn about inter-module assignments shadowing builtins Message-ID: Patches item #711448, was opened at 2003-03-28 17:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711448&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Nobody/Anonymous (nobody) Summary: Warn about inter-module assignments shadowing builtins Initial Comment: The attached patch modifies module tp_setattro to warn about code that adds a name to the globals of another module that shadows a builtin. Unfortunately, there are other ways to modify module globals (e.g. using vars() and mutating the dictionary). There are a few issues with module objects that I'm not clear about. For example, do modules always have a md_dict that is a PyDictObject? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711448&group_id=5470 From noreply@sourceforge.net Fri Mar 28 17:15:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 09:15:11 -0800 Subject: [Patches] [ python-Patches-711448 ] Warn about inter-module assignments shadowing builtins Message-ID: Patches item #711448, was opened at 2003-03-28 17:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711448&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Nobody/Anonymous (nobody) Summary: Warn about inter-module assignments shadowing builtins Initial Comment: The attached patch modifies module tp_setattro to warn about code that adds a name to the globals of another module that shadows a builtin. Unfortunately, there are other ways to modify module globals (e.g. using vars() and mutating the dictionary). There are a few issues with module objects that I'm not clear about. For example, do modules always have a md_dict that is a PyDictObject? ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2003-03-28 17:15 Message: Logged In: YES user_id=35752 Attaching patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711448&group_id=5470 From noreply@sourceforge.net Fri Mar 28 18:41:25 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 10:41:25 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 17:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Jason Tishler (jlt63) Date: 2003-03-28 09:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 15:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Fri Mar 28 18:52:10 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 10:52:10 -0800 Subject: [Patches] [ python-Patches-709743 ] os.setpgrp function failed to build Message-ID: Patches item #709743, was opened at 2003-03-26 00:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 Category: Build Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Gary H. Loechelt (loechelt) Assigned to: Martin v. Löwis (loewis) Summary: os.setpgrp function failed to build Initial Comment: The os.setpgrp function failed to build on HP-UX B.10.20 for Python 2.3a2. Comparing the build with Python 2.2.1, I noticed a missing line in the pyconfig.h.in file. I added the appropriate line to the file and rebuilt the executable. Note that I did NOT check the configure script to insure that the appropriate compiler macro (HAVE_SETPGRP) was set. I just manually set the macro in the pyconfig.h file directly. The person who has responsibility for configure should probably check it as well to make sure that it is not broken as well. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 19:52 Message: Logged In: YES user_id=21627 I see. Thanks for the report; this is now fixed in configure 1.385 configure.in 1.396 pyconfig.h.in 1.75 (notice that configure.in is the only file to change here, bot configure and pyconfig.h.in are generated). ---------------------------------------------------------------------- Comment By: Gary H. Loechelt (loechelt) Date: 2003-03-28 16:34 Message: Logged In: YES user_id=142817 The build failed because the HAVE_SETPGRP compiler macro was never set. Consequently, the code for the os.setpgrp function (posix_setpgrp) in the posixmodule.c file never compiled. Even though the rest of the posixmodule.c compiled, the os.setpgrp function was not available in the os module. Once I manually set the HAVE_SETPGRP compiler macro in the pyconfig.h header file and rebuilt posixmodule.c, everything worked and I was able to call the os.setpgrp function. I began to track down why the HAVE_SETPGRP compiler macro never got set during my configuration. Realizing that pyconfig.h is generated from pyconfig.h.in, I checked to see if HAVE_SETPGRP was even in pyconfig.h.in to start with. It was not. I compared pyconfig.h.in in python version 2.3a2 with version 2.2.1 and confirmed that HAVE_SETPGRP is indeed missing from pyconfig.h.in. Consequently, it never gets passed on to pyconfig.h during configuration, and posix_setpgrp never gets compiled in posixmodule.c because the macro is never defined. That was why I could not import the setpgrp function from the os module in my build of python 2.3a2, even though the rest of the os module was fine. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:01 Message: Logged In: YES user_id=21627 Can you please report precisely as to how it fails? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709743&group_id=5470 From noreply@sourceforge.net Fri Mar 28 20:56:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 12:56:48 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 21:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 15:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 13:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 19:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Fri Mar 28 21:16:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 13:16:18 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 17:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Jason Tishler (jlt63) Date: 2003-03-28 12:16 Message: Logged In: YES user_id=86216 John, would you be willing to help test or supply me with test cases? I have built exactly one Win32 extension. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 11:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 09:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 15:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Fri Mar 28 22:15:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 14:15:34 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-25 03:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 23:15 Message: Logged In: YES user_id=21627 I'm in favour of applying this patch, and also of patches that mandate recent Cygwin releases; if such patches are implemented, the minimum required Cygwin version should be stated somewhere. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 22:16 Message: Logged In: YES user_id=86216 John, would you be willing to help test or supply me with test cases? I have built exactly one Win32 extension. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 21:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 19:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Fri Mar 28 23:24:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 15:24:19 -0800 Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass Message-ID: Patches item #536883, was opened at 2002-03-29 20:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) >Assigned to: Martin v. Löwis (loewis) Summary: SimpleXMLRPCServer auto-docing subclass Initial Comment: This SimpleXMLRPCServer subclass automatically serves HTML documentation, generated using pydoc, in response to an HTTP GET request (XML-RPC always uses POST). Here are some examples: http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-02-10 21:25 Message: Logged In: YES user_id=108973 Patch 473586 has been accepted so this patch can be accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 21:26 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 19:55 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 19:31 Message: Logged In: YES user_id=6380 Looks cute to me. Fredrik, any problem if I just check this in? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 From noreply@sourceforge.net Fri Mar 28 23:25:54 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 15:25:54 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 23:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Fredrik Lundh (effbot) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 00:25 Message: Logged In: YES user_id=21627 I'll conclude that it is a lot of tedious work for no reason, and close this patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-21 00:55 Message: Logged In: YES user_id=31435 Python's internal format buffers are too small to use C %f in its full generality, so you're suggesting something there that's much harder to get done than you suspect. Note that %f isn't a cureall anyway, as in either Python or C, e.g., '%f' % 1e-10 throws away all information, producing a string of zeroes. What you did is usually much better than that. Let's wait to hear what /F wants to do. If he's inclined to take this part of the spec at face value, I can work with him to write a "conforming" float->string that's numerically sound. Else it's a lot of tedious work for no reason. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-21 00:24 Message: Logged In: YES user_id=108973 OK, this floating point stuff is over my head. Is it OK that it loses accuracy? - No Is it OK that it produces 16 trailing zeroes for 1e-250? - Yes Is it OK that it raises OverflowError for the normal double 1e-300? - No Would exposing and using the C %f specifier, along with repr, make for identical roundtrips? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 23:53 Message: Logged In: YES user_id=31435 I don't use XML-RPC, so I'm assigning this to /F (it was his code at the start, and he wants to keep it in synch with his company's version). Formatting floats is a difficult job if you pay attention to accuracy. The original code had the property that converting a Python float to an XML-RPC string, then back to a float again, reproduced the original input exactly. The code in the patch enjoys that property only by accident; much of the time a roundtrip conversion using it won't reproduce the number that was passed in. Is that OK? There's no way to tell, since the XML-RPC spec has scant idea what it's doing here, so leaves important questions unanswered. OTOH, it seems to me that the *point* of this porotocol is to transport values across boxes, so of course it should move heaven and earth to transport them faithfully. Is it OK that it loses accuracy? Is it OK that it produces 16 trailing zeroes for 1e-250? Is it OK that it raises OverflowError for the normal double 1e-300? No matter what's asked, the spec has no answers. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 21:48 Message: Logged In: YES user_id=108973 Ooops, I already wrote the converter (see new patch). I'm not very concerned about sending 300 character strings for large doubles, but I guess someone might be. I am concerned about how large and ugly the code is. XML-RPC is very poorly specified but the grammar for doubles seems reasonably clear (silly, but clear). If you don't like my double marshalling code, you could please just checkin your infinity/NaN detection code (also part of my patch)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 21:13 Message: Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 20:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 20:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 19:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 19:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 18:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 18:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 17:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-03-20 17:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 16:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-03-20 08:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Fri Mar 28 23:27:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 15:27:04 -0800 Subject: [Patches] [ python-Patches-545300 ] sgmllib support for additional tag forms Message-ID: Patches item #545300, was opened at 2002-04-17 20:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=545300&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Steven F. Lott (slott56) >Assigned to: Martin v. Löwis (loewis) Summary: sgmllib support for additional tag forms Initial Comment: MS-word generated HTML includes declaration tags of the form:   scattered throughout the body of an HTML document. The current sgmllib parse_declaration routine rejects these as invalid syntax, where browsers tolerate these embedded declarations. This patch accepts these declaration forms. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-22 10:23 Message: Logged In: YES user_id=21627 I now recommend to approve this patch. It improves SGML correctness, and, while supporting an MS extension, explicitly points out that it is doing so. ---------------------------------------------------------------------- Comment By: Steven F. Lott (slott56) Date: 2002-04-22 20:50 Message: Logged In: YES user_id=328067 My suggestion for handling this MS extension syntax is to (1) tolerate the extension without an error, (2) treat it as an SGML marked section, using the unknown_decl() call-back. Since this is a separate function, subclasses can override to alter this behavior. The content hidden in these MS-specific marked section appears to always be a  . While it might be expedient to completly skip over this junk, it makes it difficult to handle marked sections in a future version of markupbase. Attached is a revised patch against V1.39 of sgmllib.py and 1.4 of markupbase.py ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-04-21 17:11 Message: Logged In: YES user_id=3066 This is the same as bug #505747. These "tags" are not legal HTML in any form, but are some Microsoft invention. It's not entirely clear what the right thing to do is, but it is clear that we need to deal with these in some different way. Changed group to indicate that such changes can only go into the trunk; feature changes in maintenance versions are not allowed. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-04-18 19:23 Message: Logged In: YES user_id=21627 That patch looks wrong: You are changing what a tag is, removing the underscore, however, underscores are allowed in tag names. Also, could you please generate the patch against the CVS version of the code? Your patch doesn't apply for the current code, which has changed significantly compared to the version you appear to be using. There is no way that this can go into 2.1 IMO. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=545300&group_id=5470 From noreply@sourceforge.net Fri Mar 28 23:31:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 15:31:11 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 21:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 18:31 Message: Logged In: YES user_id=87160 I can help with testing. I have access to W2K and Win98 (ugh) boxen. I don't mind installing a few older toolchains if you think that's necessary. I think any C/C++ python extension using plain distutils (no fancy hacks added on) and has one or more DLL dependencies is a good test case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 17:15 Message: Logged In: YES user_id=21627 I'm in favour of applying this patch, and also of patches that mandate recent Cygwin releases; if such patches are implemented, the minimum required Cygwin version should be stated somewhere. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 16:16 Message: Logged In: YES user_id=86216 John, would you be willing to help test or supply me with test cases? I have built exactly one Win32 extension. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 15:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 13:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 19:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Fri Mar 28 23:31:57 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 15:31:57 -0800 Subject: [Patches] [ python-Patches-554807 ] Add _winreg support for Cygwin Message-ID: Patches item #554807, was opened at 2002-05-11 14:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470 Category: Windows Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Mark Hammond (mhammond) Summary: Add _winreg support for Cygwin Initial Comment: This adds _winreg support to Cygwin Python without dependencies on other Windows modules. For platforms in which MS_WINDOWS isn't defined, this reports the OSError exception instead of WindowsErr. It also uses the non-MBCS versions of registry access in this case. Some minor changes to _winreg.c were made to clean up compiler warnings from GCC. setup.py was changed to create a dynamic _winreg module under cygwin. There are also some earlier changes in the patch file to skip the import test (due to Cygwin fork issues), and to require libintl when building _locale under Cygwin. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 00:31 Message: Logged In: YES user_id=21627 I'm rejecting that patch, since no updates are happening. If somebody wants to deal with _winreg support for Cygwin again, please submit a new patch. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-09-18 04:57 Message: Logged In: YES user_id=14198 I'll take this on. I have a number of other patches and bugs to look at, so if someone wants to beat me to it, be my guest. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-16 19:57 Message: Logged In: YES user_id=21627 If a convincing patch comes along, I'd happily apply it. Supporting _winreg is still reasonable even if /proc/registry exists, for compatibility with other Win32 ports. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2002-09-16 19:19 Message: Logged In: NO I'm prepared to try to help if there's still energy here, and there are specific things to do. However I agree that _if_ the cygwin /proc/registry story is going to become writeable, then there's not much point. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-07-30 16:04 Message: Logged In: YES user_id=329402 I plan to get back to this eventually, although held off for three reasons: - Cygwin is incorporating a registry file system that may be a better way to implement this - saw some posts about possible Unicode changes - Real Life (job priorities, vacation) I probably won't get back to this until the middle of August. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:02 Message: Logged In: YES user_id=21627 Is any kind of tweaking forthcoming? ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-05-15 15:30 Message: Logged In: YES user_id=329402 It sounds like the patches need some tweaking (my testing had passed but was certainly limited). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-15 14:57 Message: Logged In: YES user_id=21627 Yes, but you are wrong assuming that the *A functions expect Latin-1. Instead, they expect char* encoded as CP_ACP, which is known as "mbcs" in Python. The *W functions do *not* expect multi-byte strings, but Unicode strings. Notice that _winreg also calls the *A functions, even in MSVC builds. So I think converting Unicode to Latin-1 is definitely incorrect. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-05-15 14:48 Message: Logged In: YES user_id=329402 Windows supplies two versions of the relevant functions. The Cygwin version (at least as built) uses the ANSI versions, as indicated by the A at the end of the symbol names: $ nm _winreg.o | grep RegQueryValue U _RegQueryValueA@16 U _RegQueryValueExA@24 As opposed to the "Windows Unicode/wide-char" functions, which end in W and require MBCS functions to decode. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-15 00:23 Message: Logged In: YES user_id=21627 Can you please explain why not using MBCS is the right thing? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470 From noreply@sourceforge.net Fri Mar 28 23:34:26 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 15:34:26 -0800 Subject: [Patches] [ python-Patches-590682 ] New codecs: html, asciihtml Message-ID: Patches item #590682, was opened at 2002-08-04 06:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590682&group_id=5470 Category: None Group: None >Status: Closed >Resolution: Rejected Priority: 3 Submitted By: Oren Tirosh (orenti) Assigned to: M.-A. Lemburg (lemburg) Summary: New codecs: html, asciihtml Initial Comment: These codecs translate HTML character &entity; references. The html codec may be applied after other codecs such as utf-8 or iso8859_X and preserves their encoding. The asciihtml encoder produces 7-bit ascii and its output is therefore safe for insertion into almost any document regardless of its encoding. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 00:34 Message: Logged In: YES user_id=21627 Apparently, this patch is not needed anymore, so I'm rejecting it. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-12 11:11 Message: Logged In: YES user_id=21627 Oren, is this patch still needed, as we now have the xmlcharrefreplace error handler? ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-08-09 17:38 Message: Logged In: YES user_id=562624 Case insensitivity fixed. General cleanup. Codecs renamed to htmlescape and htmlescape8bit. Improved error handling. Update unicode_test. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-08-05 14:11 Message: Logged In: YES user_id=562624 Yes, entities are supposed to be case sensitive but I'm working with manually-generated html in which > is not so uncommon... I guess life is different in XML world. Case-smashing loses the distinction between some entities. I guess I need a more intelligent solution. > If you apply it to an 8-bit UTF-8 encoded strings you'll get garbage! Actually, it works great. The html codec passes characters 128-255 unmodified and therefore can be chained with other codecs. But I now have a more elegant and high-performance approach than codec chaining. See my python-dev posting. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-08-05 14:11 Message: Logged In: YES user_id=562624 Yes, entities are supposed to be case sensitive but I'm working with manually-generated html in which > is not so uncommon... I guess life is different in XML world. Case-smashing loses the distinction between some entities. I guess I need a more intelligent solution. > If you apply it to an 8-bit UTF-8 encoded strings you'll get garbage! Actually, it works great. The html codec passes characters 128-255 unmodified and therefore can be chained with other codecs. But I now have a more elegant and high-performance approach than codec chaining. See my python-dev posting. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-08-05 09:59 Message: Logged In: YES user_id=38388 On the htmlentitydefs: yes, these are in use as they are defined now. If you want a mapping from and to Unicode, I'd suggest to provide this as a new table. About the cased key in the entitydefs dict: AFAIK, these have to be cased since entities are case-sensitive. Could be wrong though. On PEP 293: this is going in the final round now. Your patch doesn't compete with it though, since PEP 293 is a much more general approach. On the general idea: I think the codecs are misnamed. They should be called htmlescape and asciihtmlescape since they don't provide "real" HTML encoding/decoding as Martin already mentioned. There's something wrong with your approach, BTW: the codec should only operate on Unicode (taking only Unicode input and generating Unicode). If you apply it to an 8-bit UTF-8 encoded strings you'll get garbage ! ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 17:54 Message: Logged In: YES user_id=21627 I'm in favour of exposing this via a search functions, for generated codec names, on top of PEP 293 (I would not like your codec to compete with the alternative mechanism). My dislike for the current patch also comes from the fact that it singles-out ASCII, which the search function would not. You could implement two forms: html.codecname and xml.codecname. The html form would do HTML entity references in both directions, and fall back to character references only if necessary; the XML form would use character references all the time, and entity references only for the builtin entities. And yes, I do recommend users to use codecs.charmap_encode directly, as this is probably the most efficient, yet most compact way to convert Unicode to a less-than-7-bit form. In anycase, I'd encourage you to contribute to the progress of PEP 293 first - this has been an issue for several years now, and I would be sorry if it would fail. While you are waiting for PEP 293 to complete, please do consider cleaning up htmlentitydefs to provide mappings from and to Unicode characters. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-08-04 17:07 Message: Logged In: YES user_id=562624 >People may be tricked into believing that they can >decode arbitrary HTML with your codec - when your >codec would incorrectly deal with CDATA sections. You don't even need to go as far as CDATA to see that tags must be parsed first and only then tag bodies and attribute values can be individually decoded. If you do it in the reverse order the tag parser will try to parse < as a tag. It should be documented, though. For encoding it's also obvious that encoding must be done first and then the encoded strings can be inserted into tags - < in strings is encoded into < preventing it from being interpreted as a tag. This is a good thing! it prevents insertion attacks. > You can easily enough arrange to get errors on <, >, > and &, by using codecs.charmap_encode with an > appropriate encoding map. If you mean to use this as some internal implementation detail it's ok. Are actually proposing that this is the way end users should use it? How about this: Install an encoder registry function that responds to any codec name matching "xmlcharref.SPAM" and does all the internal magic you describe to create a codec instance that combines xmlcharref translation including <,>,& and the SPAM encoding. This dynamically-generated codec will do both encoding and decoding and be cached, of course. "Namespaces are one honking great idea -- let's do more of those!" ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 13:50 Message: Logged In: YES user_id=21627 You can easily enough arrange to get errors on <, >, amd &, by using codecs.charmap_encode with an appropriate encoding map. Infact, with that, you can easily get all entity refereces into the encoded data, without any need for an explicit iteration. However, I am concerned that you offer decoding as well. People may be tricked into believing that they can decode arbitrrary HTML with your codec - when your codec would incorrectly deal with CDATA sections. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-08-04 13:10 Message: Logged In: YES user_id=562624 PEP 293 and patch #432401 are not a replacement for these codecs - it does decoding as well as encoding and also translates <, >, and & which are valid in all encodings and therefore won't get translated by error callbacks. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-08-04 13:00 Message: Logged In: YES user_id=562624 Yes, the error callback approach handles strange mixes better than my method of chaining codecs. But it only does encoding - this patch also provides full decoding of named, decimal and hexadecimal character entity references. Assuming PEP 293 is accepted, I'd like to see the asciihtml codec stay for its decoding ability and renamed to xmlcharref. The encoding part of this codec can just call .encode("ascii", errors="xmlcharrefreplace") to make it a full two-way codec. I'd prefer htmlentitydefs.py to use unicode, too. It's not so useful the way it is. Another problem is that it uses mixed case names as keys. The dictionary lookup is likely to miss incoming entities with arbitrary case so it's more-or-less broken. Does anyone actually use it the way it is? Can it be changed to use unicode without breaking anyone's code? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-08-04 10:54 Message: Logged In: YES user_id=21627 This patch is superceded by PEP 293 and patch #432401, which allows you to write unitext.encode("ascii", errors = "xmlcharrefreplace") This probably should be left open until PEP 293 is pronounced upon, and then either rejected or reviewed in detail. I'd encourage a patch that uses Unicode in htmlentitydefs directly, and computes entitydefs from that, instead of vice-versa (or atleast exposes a unicode_entitydefs, perhaps even lazily) - perhaps also with a reverse mapping. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=590682&group_id=5470 From noreply@sourceforge.net Sat Mar 29 06:47:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Fri, 28 Mar 2003 22:47:47 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Sat Mar 29 10:04:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 02:04:11 -0800 Subject: [Patches] [ python-Patches-707701 ] fix for #698517, Tkinter and tk8.4.2 Message-ID: Patches item #707701, was opened at 2003-03-21 20:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 Category: Tkinter Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 7 Submitted By: Matthias Klose (doko) Assigned to: Martin v. Löwis (loewis) Summary: fix for #698517, Tkinter and tk8.4.2 Initial Comment: [all python version, that can be built with tk8.4.2] Fixing the failing conversions in _substitute. Use try/except for each integer field, that is not supported by all events. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 11:04 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as Tkinter.py 1.170 and 1.160.10.3. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 00:58 Message: Logged In: YES user_id=21627 Well, no. The Tk change was made for a reason, and it is unlikely that Tk people will back it out, so we should not bypass this change. If you want to get up and running, I recommend to use Tcl 8.3. ---------------------------------------------------------------------- Comment By: Jeremy Moore (jmoore_calaway) Date: 2003-03-28 00:31 Message: Logged In: YES user_id=744000 (Apologies if this is the inappropriate place to ask) I'm porting an app to Mac OS X 10.2 (begrudgingly) and ran straight into this bug. Nothing like changing versions of python (2.2.2 to 2.3a2) and tcl/tk (8.3.4 to 8.4.2) while using a platform you're unfamiliar with! Anyway, I have successflly applied the patch; however, it has simply propagated the problem elsewhere. Specifically, the pmw rev 1.1 widgets library. The problem is, pmw does additional processing that chokes on the '??' now returned by the try: excempt: statements. Perhaps, if anyone knows, it would be better to mimick what tcl/tk 8.3.x returned with the except statements. Pmw may not be the only library out there that will get choked up on this. I will submit a bug in the pmw for this as well, but I'm looking for a least resistance path to get things up and running. (And not really wanting to rewite all my GUI constructon code...) Thanks Jeremy Moore ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-23 14:35 Message: Logged In: YES user_id=60903 > What is the problem that this patch solves? As the subject says: Provide a patch for #698517. tk8.4.2 returns for the undefined fields in events empty strings or '??' strings, on which the int conversions fail. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 13:07 Message: Logged In: YES user_id=21627 What is the problem that this patch solves? ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-22 08:26 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 23:15 Message: Logged In: YES user_id=60903 Attach alternate patch by Chad ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 23:10 Message: Logged In: YES user_id=40145 Hmmm, you are right. Your approach will be quicker, due to local namespace function lookup speed (try/except is fast in non-exception path). But, then again, a lot more exception paths will be executed with the new Tk (with "??" fields), anyway, so the speed issues may not be that important. ---------------------------------------------------------------------- Comment By: Matthias Klose (doko) Date: 2003-03-21 22:14 Message: Logged In: YES user_id=60903 I thought the whole thing to define getint = int was to do local lookups only. Therefore the inlined try/excepts ---------------------------------------------------------------------- Comment By: Chad Netzer (chadn) Date: 2003-03-21 21:59 Message: Logged In: YES user_id=40145 Would it be better to simply define getint() as: def getint( s ): try: return int( s ) except ValueError: return s Rather than add lots of try/excepts in the codebase? I'm attaching an example diff (btw - I kept your field explanations in the code; I liked them there) These patches are important, BTW, since 8.4.1 has a few bugs that would require other patches to Tkinter (returning "" for getboolean for example, which seems to be fixed) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707701&group_id=5470 From noreply@sourceforge.net Sat Mar 29 11:37:22 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 03:37:22 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-29 06:37 Message: Logged In: YES user_id=6380 -1. It changes semantics in an ad-hoc way. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Sat Mar 29 14:16:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 06:16:37 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 09:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) >Assigned to: Guido van Rossum (gvanrossum) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 15:16 Message: Logged In: YES user_id=21627 This patch breaks binary compatibility, as it changes the layout of PyInterpreterState. We could reduce the risk of breakage by moving the new members at the end of the struct. Assigning to Guido for pronouncement: Should this a) be rejected? b) be accepted as is? (arguing that nobody uses the interpreter state, anyway) c) accepted with the proposed change (i.e. sizeof(PyInterpreterState) still changes, but the offset of the existing members doesn't). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-28 09:40 Message: Logged In: YES user_id=38388 Looks ok. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:00 Message: Logged In: YES user_id=21627 Marc-Andre, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2003-03-27 09:25 Message: Logged In: YES user_id=537938 Here is the patch. It is tested and verified to fix the problem by two people. I also verified that it passes the test suite. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Sat Mar 29 14:18:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 06:18:48 -0800 Subject: [Patches] [ python-Patches-710127 ] Make "%c" % u"a" work Message-ID: Patches item #710127, was opened at 2003-03-26 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 Category: Core (C code) Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Walter Dörwald (doerwalter) >Assigned to: Walter Dörwald (doerwalter) >Summary: Make "%c" % u"a" work Initial Comment: Currently "%c" % u"a" fails, while "%s" % u"a" works. This patch fixes this problem. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 15:18 Message: Logged In: YES user_id=21627 Looks fine, please apply it. Also add a test case that fails now but passes with the change, and add a NEWS entry. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 From noreply@sourceforge.net Sat Mar 29 14:40:56 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 06:40:56 -0800 Subject: [Patches] [ python-Patches-612627 ] Allow more Unicode on sys.stdout Message-ID: Patches item #612627, was opened at 2002-09-21 22:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=612627&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Löwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: Allow more Unicode on sys.stdout Initial Comment: This patch extends the set of Unicode strings that can be printed to sys.stdout, to support all strings that the terminal will likely support. It also adds an encoding attribute to sys.std{in,out}. To do that: - it adds a .encoding attribute to all file objects, which is normally None - initializes the encoding of sys.stdin and sys.stdout if either is a terminal. - adds a wrapper object around sys.stdout in site.py that encodes all Unicode objects according to the detected encoding, if that encoding is known to Python To find the encoding of the terminal, it - uses GetConsoleCP and GetConsoleOutputCP on Windows, - uses nl_langinfo(CODESET) on Unix, if available. The primary rationale for this change is that people should be able to print Unicode in an interactive session. A parallel change needs to be added for IDLE, so that it adds the .encoding attribute to the emulated stdout (it already supports printing of Unicode on stdout). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 15:40 Message: Logged In: YES user_id=21627 In stdout3.txt, PyFile_SetEncoding has been added, wrapping the creation and assignment of the string object f_encoding. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-28 09:44 Message: Logged In: YES user_id=38388 Looks ok except for the direct hacking of f_encoding in the sys module. Please add either a macro or a new API to make changing the encoding from C possible without tapping directly into the implementation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-23 12:59 Message: Logged In: YES user_id=21627 Is the patch now acceptable? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-10-26 19:47 Message: Logged In: YES user_id=21627 I've attached a revised version which implements your proposal; this version works without modification of site.py. In its current form, the file encoding is only applied in print; for sys.stdout.write, it is ignored. For print, it is applied independent of whether this is a script or interactive mode. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-10-25 14:09 Message: Logged In: YES user_id=38388 I think it could work by adding a special case to PyFile_WriteObject() instead of calling PyObject_Print(). You first encode the Unicode object and then let PyFile_WriteString() take care of the writing to the FILE* object. I see no other way, since you can't place the .encoding information into the FILE* object. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-09-24 11:02 Message: Logged In: YES user_id=21627 I have considered implementing it in the file object. However, it becomes quite involved, and heavy C code: PyFile_WriteObject calls PyObject_Print. Since Unicode does not implement a tp_print, this calls str/repr, which converts using the default encoding. It is not clear at which point the file encoding should be taking into account. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2002-09-24 10:10 Message: Logged In: NO I like the .encoding concept. I don't really like the sys.stdout wrapper. Wouldn't it be better to add the functionality to the file object .write() and .writelines() methods and then only use the wrapper in case sys.stdout is not a true file object ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=612627&group_id=5470 From noreply@sourceforge.net Sat Mar 29 15:00:07 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 07:00:07 -0800 Subject: [Patches] [ python-Patches-710931 ] iconv codec-NG and Korean Codecs Message-ID: Patches item #710931, was opened at 2003-03-27 20:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Martin v. Löwis (loewis) Summary: iconv codec-NG and Korean Codecs Initial Comment: This patch includes update for iconv_codec, new sources for korean codecs and MultibyteCodec supplemental library. I splitted out common parts of codecs for usual multibyte encodings into multibytecodec.c and this iconv codec and the korean codecs are using it. The korean codecs is only 58K in stripped i386 ELF and 62K in stripped i386 PECOFF binary and I think it's small enough to be incorporated into python. Files: Lib/encodings/aliases.py adds aliases for korean encodings and remove comments that isn't true now. Lib/encodings/cp949.py Lib/encodings/euc_kr.py codecs for korean encodings Lib/encodings/iconv_codec.py updated for new _iconv_codec implementation Lib/test/test_ko_codecs.py unit test for cp949, euc_kr codec Lib/test/test_ko_codecs_mapping.py unit test to test cp949 mapping Lib/test/test_iconv_codec_euc_kr.py another iconv_codec test unit. because non-unicode multibyte encoding is required to test both of iconv_codec and multibytecodec. Lib/test/test_multibytecodec_support.py common part for above test units Modules/_iconv_codec.c new implementation of _iconv_codec. this resolves numerous problems that previous implementation had. and iconv_codec has sane StreamReader now! :) Modules/_ko_codec.c Modules/_ko_codec.h korean codecs module Modules/multibytecodec.c Modules/multibytecodec.h common multibyte codec supplement. I think that this can be used for any usual multibyte encodings. I'll submit Chinese Codecs in few days using this. Tools/unicode/genmap_ko_codecs.py code generator for _ko_codecs.h ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 16:00 Message: Logged In: YES user_id=21627 Please submit an individual patch for each single bug fix or new feature; it appears that this patch deals with completely unrelated things. Therefore. I'm rejecting this patch, encouraging you to submit new separate patches. I have a few specific comments you may want to consider: - What is the rationale for adding an alias processing to the iconv codecs? - It is unclear how you expect reuse of the multibytecodec.c. Currently, this is incorporated into _ko_codecs. How would this cooperate with other usages of multibytecodecs? In particular, why is that needed in iconv_codec? - "complete reimplementation" is insufficient reason to accept a change. What specific problems does the old iconv codec have, and how specifically have they been corrected? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 From noreply@sourceforge.net Sat Mar 29 15:02:01 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 07:02:01 -0800 Subject: [Patches] [ python-Patches-710931 ] iconv codec-NG and Korean Codecs Message-ID: Patches item #710931, was opened at 2003-03-27 20:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Martin v. Löwis (loewis) Summary: iconv codec-NG and Korean Codecs Initial Comment: This patch includes update for iconv_codec, new sources for korean codecs and MultibyteCodec supplemental library. I splitted out common parts of codecs for usual multibyte encodings into multibytecodec.c and this iconv codec and the korean codecs are using it. The korean codecs is only 58K in stripped i386 ELF and 62K in stripped i386 PECOFF binary and I think it's small enough to be incorporated into python. Files: Lib/encodings/aliases.py adds aliases for korean encodings and remove comments that isn't true now. Lib/encodings/cp949.py Lib/encodings/euc_kr.py codecs for korean encodings Lib/encodings/iconv_codec.py updated for new _iconv_codec implementation Lib/test/test_ko_codecs.py unit test for cp949, euc_kr codec Lib/test/test_ko_codecs_mapping.py unit test to test cp949 mapping Lib/test/test_iconv_codec_euc_kr.py another iconv_codec test unit. because non-unicode multibyte encoding is required to test both of iconv_codec and multibytecodec. Lib/test/test_multibytecodec_support.py common part for above test units Modules/_iconv_codec.c new implementation of _iconv_codec. this resolves numerous problems that previous implementation had. and iconv_codec has sane StreamReader now! :) Modules/_ko_codec.c Modules/_ko_codec.h korean codecs module Modules/multibytecodec.c Modules/multibytecodec.h common multibyte codec supplement. I think that this can be used for any usual multibyte encodings. I'll submit Chinese Codecs in few days using this. Tools/unicode/genmap_ko_codecs.py code generator for _ko_codecs.h ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 16:00 Message: Logged In: YES user_id=21627 Please submit an individual patch for each single bug fix or new feature; it appears that this patch deals with completely unrelated things. Therefore. I'm rejecting this patch, encouraging you to submit new separate patches. I have a few specific comments you may want to consider: - What is the rationale for adding an alias processing to the iconv codecs? - It is unclear how you expect reuse of the multibytecodec.c. Currently, this is incorporated into _ko_codecs. How would this cooperate with other usages of multibytecodecs? In particular, why is that needed in iconv_codec? - "complete reimplementation" is insufficient reason to accept a change. What specific problems does the old iconv codec have, and how specifically have they been corrected? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710931&group_id=5470 From noreply@sourceforge.net Sat Mar 29 16:12:45 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 08:12:45 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 11:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Sat Mar 29 16:25:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 08:25:48 -0800 Subject: [Patches] [ python-Patches-711838 ] urllib2 doesn't support non-anonymous ftp Message-ID: Patches item #711838, was opened at 2003-03-29 11:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711838&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2 doesn't support non-anonymous ftp Initial Comment: urllib2 doesn't support non-anonymous ftp. Added support based on how urllib did it. More details about this bug in Red Hat's bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=78168 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=80676 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711838&group_id=5470 From noreply@sourceforge.net Sat Mar 29 16:32:53 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 08:32:53 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-29 06:37 Message: Logged In: YES user_id=6380 -1. It changes semantics in an ad-hoc way. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Sat Mar 29 17:11:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 09:11:14 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 12:11 Message: Logged In: YES user_id=80475 Arghh, I don't see what the problem is. The co_names cache variable is private and not part of the public interface for code objects. The only way to see a change in behavior is for a program to violate the prohibition of sticking a name in another module's globals that affects a builtin (and, even then, it would have to occur between calls the the function). Normal shadowing (using globals) would continue to work just fine. While it gives only a minor timing gain, the big win would be removing the incentive to create python code like this: def f(x, y, int=int, True=True, chr=chr): . . . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-29 06:37 Message: Logged In: YES user_id=6380 -1. It changes semantics in an ad-hoc way. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Sat Mar 29 17:27:12 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 09:27:12 -0800 Subject: [Patches] [ python-Patches-711861 ] Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Message-ID: Patches item #711861, was opened at 2003-03-29 12:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) Summary: Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Initial Comment: Okay, here's one __builtin__ that's guaranteed not to change or be shadowed. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 From noreply@sourceforge.net Sat Mar 29 17:28:11 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 09:28:11 -0800 Subject: [Patches] [ python-Patches-711861 ] Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Message-ID: Patches item #711861, was opened at 2003-03-29 12:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) >Summary: Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Initial Comment: Okay, here's one __builtin__ that's guaranteed not to change or be shadowed. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 From noreply@sourceforge.net Sat Mar 29 17:37:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 09:37:55 -0800 Subject: [Patches] [ python-Patches-711861 ] Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Message-ID: Patches item #711861, was opened at 2003-03-29 18:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) >Summary: Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Initial Comment: Okay, here's one __builtin__ that's guaranteed not to change or be shadowed. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 18:37 Message: Logged In: YES user_id=21627 Where do you get this guarantee from? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 From noreply@sourceforge.net Sat Mar 29 18:05:53 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 10:05:53 -0800 Subject: [Patches] [ python-Patches-711861 ] Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Message-ID: Patches item #711861, was opened at 2003-03-29 12:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Neal Norwitz (nnorwitz) >Summary: Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Initial Comment: Okay, here's one __builtin__ that's guaranteed not to change or be shadowed. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 13:05 Message: Logged In: YES user_id=80475 Python 2.3a2 (#39, Feb 19 2003, 17:58:58) [MSC v.1200 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> None = 1 SyntaxError: assignment to None (, line 1) In addition, the compiler already makes this assumption elsewhere. Every function ends with: 2 0 LOAD_CONST 0 (None) 3 RETURN_VALUE ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 12:37 Message: Logged In: YES user_id=21627 Where do you get this guarantee from? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 From noreply@sourceforge.net Sat Mar 29 19:01:00 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 11:01:00 -0800 Subject: [Patches] [ python-Patches-711902 ] Cause pydoc to show data descriptor __doc__ strings Message-ID: Patches item #711902, was opened at 2003-03-29 10:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711902&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Cause pydoc to show data descriptor __doc__ strings Initial Comment: Data descriptors (descriptors having both a __get__ and a __set__ method) often have __doc__ strings. Pydoc displays these for descriptors of type property, but not for other types (e.g., getsets). The attached patch will display __doc__ strings for data descriptors (if available) in the "Data and non-method functions" section of the type description. This patch is intended to be a minimal change. It's possible that inspect.classify_class_attrs should return a new kind for data descriptors (or possibly the "property" kind should include all data descriptors (not just properties)), which could then be handled differently from other non-classified data. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711902&group_id=5470 From noreply@sourceforge.net Sat Mar 29 21:19:50 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 13:19:50 -0800 Subject: [Patches] [ python-Patches-711861 ] Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Message-ID: Patches item #711861, was opened at 2003-03-29 12:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open >Resolution: Later Priority: 5 Submitted By: Raymond Hettinger (rhettinger) >Assigned to: Raymond Hettinger (rhettinger) >Summary: Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Initial Comment: Okay, here's one __builtin__ that's guaranteed not to change or be shadowed. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 16:19 Message: Logged In: YES user_id=80475 Hmm, in Py2.3a2+, it only gives a warning. Putting this one on hold until I can find out why it was safe for the compiler to return a None constant at the end of a function. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 13:05 Message: Logged In: YES user_id=80475 Python 2.3a2 (#39, Feb 19 2003, 17:58:58) [MSC v.1200 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> None = 1 SyntaxError: assignment to None (, line 1) In addition, the compiler already makes this assumption elsewhere. Every function ends with: 2 0 LOAD_CONST 0 (None) 3 RETURN_VALUE ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 12:37 Message: Logged In: YES user_id=21627 Where do you get this guarantee from? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 From noreply@sourceforge.net Sat Mar 29 21:28:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 13:28:48 -0800 Subject: [Patches] [ python-Patches-708374 ] add offset to mmap Message-ID: Patches item #708374, was opened at 2003-03-23 09:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add offset to mmap Initial Comment: This patch is from Yotam Medini sent to me in mail. It adds support for the offset parameter to mmap. It ignores the check for mmap size "if the file is character device. Some device drivers (which I happen to use) have zero size in fstat buffer, but still one can seek() read() and tell()." I added minimal doc and tests. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-29 16:28 Message: Logged In: YES user_id=33168 Sounds fair. Attached is an updated patch which includes windows support (I think). I cannot test on Windows. Tested on Linux. Includes updates for doc, src, and test. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 19:12 Message: Logged In: YES user_id=21627 I think non-zero offsets need to be supported for Windows as well. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-23 10:37 Message: Logged In: YES user_id=33168 Email received from Yotam: I have downloaded and patched the 2.3a source. compiled locally just this module, and it worked fine for my application (with offset for character device file) I did not run the released test though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708374&group_id=5470 From noreply@sourceforge.net Sat Mar 29 21:46:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 13:46:55 -0800 Subject: [Patches] [ python-Patches-706707 ] time.tzset standards compliance update Message-ID: Patches item #706707, was opened at 2003-03-19 23:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Stuart Bishop (zenzen) Assigned to: Neal Norwitz (nnorwitz) Summary: time.tzset standards compliance update Initial Comment: Update to configure.in and test_time.py to only use TZ environment variable format documented at http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-29 16:46 Message: Logged In: YES user_id=33168 In the last chunk added, there is a bare except when calling time.tzset(). What are the possible exceptions? I don't want to have a bare except since this can mask a real error. The patch still fails for me on Linux (Redhat): * line 107: self.failUnless(time.tzname[1] == 'AEDT') - tzname has: ('AEST', 'AEST') * line 109: self.failUnlessEqual(time.daylight, 1) * line 111: self.failUnlessEqual(time.altzone, -39600) Haven't tried on other Unixes. ---------------------------------------------------------------------- Comment By: Stuart Bishop (zenzen) Date: 2003-03-27 15:12 Message: Logged In: YES user_id=46639 tzset3.diff is an updated diff against the CVS head. Fixes: -Don't test time.altzone for UTC - non-DST means altzone is undefined -Make sure dst timezone name is not the same as non-dst timezone name in TZ environment variable, to work around an apparent Solaris bug. -Extraneous cruft removed from test_time.py and configure.in - no more irrelevant comments. -More whitespace as per Tim's comments comments. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-21 16:28 Message: Logged In: YES user_id=33168 After patching, the test fails: File "/home/neal/build/python/2_3/Lib/test/test_time.py", line 115, in test_tzset self.failUnlessEqual(time.daylight,1) File "/home/neal/build/python/2.3/Lib/unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: 0 != 1 Also, why is the code commented out (via a string) on lines 120-144? Should these be removed? I see the comment about wallclock time, but don't understand why the code should be left in if we can't test it. I can understand a comment describing generally the issue. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-20 20:18 Message: Logged In: YES user_id=33168 I'll try to get to this soon. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-20 20:11 Message: Logged In: YES user_id=6380 Unassigning, as I won't hve time for this. But it is important - someone else should make sure this goes into 2.3b1! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-20 16:50 Message: Logged In: YES user_id=31435 Assigned to Guido, as I can't test it. Two notes: 1. Leaving commented-out code in config and the test suite doesn't appear to serve a purpose, although it will serve to confuse future readers ("why is this here? why is it commented out?"). 2. The Python style guide asks for a blank after commas in argument lists and tuples. We're not really in danger of stretching the screen here . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706707&group_id=5470 From noreply@sourceforge.net Sat Mar 29 22:40:58 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 14:40:58 -0800 Subject: [Patches] [ python-Patches-659834 ] Check for readline 2.2 features Message-ID: Patches item #659834, was opened at 2002-12-29 20:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=659834&group_id=5470 Category: Build Group: Python 2.2.x >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Magnus Lie Hetland (mlh) >Assigned to: Neal Norwitz (nnorwitz) Summary: Check for readline 2.2 features Initial Comment: This patch adds a snippet to configure.in, to check whether rl_completion_append_character (which is used in Python 2.3) is available. rl_prep_terminal is assumed to co-exist with rl_completion_append_character. It is assumed that HAVE_RL_COMPLETION_APPEND_CHARACTER will be used in readline.c to make it compatible with older versions of the readline library. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-29 17:40 Message: Logged In: YES user_id=33168 Magnus, it would be great if you could test 2.2.3 from CVS too. I have checked in a change that builds and works with newer versions of readline. I don't have readline v2.2. Checked in as: * configure 1.279.6.19 * configure.in 1.288.6.19 * Modules/readline.c 2.41.6.7 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-12 17:16 Message: Logged In: YES user_id=6380 I need a volunteer to backport this to 2.2 who can run an older version of autoconf; the autoconf that I have installed is too new to process the 2.2 configure.in file. (The 2.3 version is already checked in.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-12-30 11:27 Message: Logged In: YES user_id=6380 Checked in. Thanks! Hm, this should be backported to 2.2.3 too! So I'll keep it open. ---------------------------------------------------------------------- Comment By: Magnus Lie Hetland (mlh) Date: 2002-12-29 21:11 Message: Logged In: YES user_id=20535 New patch for configure.in (added a comment) and a patch for readline.c that uses HAVE_RL_COMPLETION_APPEND_CHARACTER. Tested on Gentoo Linux with new readline (the new completion behaviour was preserved) and on Solaris with old readline (now compiles, with old completion behaviour in place). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=659834&group_id=5470 From noreply@sourceforge.net Sun Mar 30 01:13:44 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 17:13:44 -0800 Subject: [Patches] [ python-Patches-659834 ] Check for readline 2.2 features Message-ID: Patches item #659834, was opened at 2002-12-30 02:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=659834&group_id=5470 Category: Build Group: Python 2.2.x Status: Closed Resolution: Accepted Priority: 5 Submitted By: Magnus Lie Hetland (mlh) Assigned to: Neal Norwitz (nnorwitz) Summary: Check for readline 2.2 features Initial Comment: This patch adds a snippet to configure.in, to check whether rl_completion_append_character (which is used in Python 2.3) is available. rl_prep_terminal is assumed to co-exist with rl_completion_append_character. It is assumed that HAVE_RL_COMPLETION_APPEND_CHARACTER will be used in readline.c to make it compatible with older versions of the readline library. ---------------------------------------------------------------------- >Comment By: Magnus Lie Hetland (mlh) Date: 2003-03-30 03:13 Message: Logged In: YES user_id=20535 I've now tested it with 2.2.3 (using the 2.2 maintenance branch, which had the revision numbers you cited) and it works nicely. That is, my old readline (readline 2.2, I think, although I couldn't find the version number this time around -- at least it doesn't have the completer character functionality) works. There is one thing I find a bit odd, though... With the 2.3 version of this check, the following ends up in pyconfig.h: /* Define if you have readline 2.2 */ /* #undef HAVE_RL_COMPLETION_APPEND_CHARACTER */ However, it isn't there when I use the 2.2 branch version. I guess it shouldn't matter either way (it's uncommented anyway), but it seems that the two versions behave differently, though... But since it all works, it's a bit hard to find out what's "wrong", if anything... Anyway, the (tentative) verdict from me is that it works. And just a final note: This check is really sort of a "band aid" solution, since the behaviour of the completer will differ, based on which readline version you have. Making the default the same for readline 2.2 and readline 4.* and making it configurable from Python for the newer versions might be better... Although possibly not important enough to warrant the work. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-29 23:40 Message: Logged In: YES user_id=33168 Magnus, it would be great if you could test 2.2.3 from CVS too. I have checked in a change that builds and works with newer versions of readline. I don't have readline v2.2. Checked in as: * configure 1.279.6.19 * configure.in 1.288.6.19 * Modules/readline.c 2.41.6.7 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-12 23:16 Message: Logged In: YES user_id=6380 I need a volunteer to backport this to 2.2 who can run an older version of autoconf; the autoconf that I have installed is too new to process the 2.2 configure.in file. (The 2.3 version is already checked in.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-12-30 17:27 Message: Logged In: YES user_id=6380 Checked in. Thanks! Hm, this should be backported to 2.2.3 too! So I'll keep it open. ---------------------------------------------------------------------- Comment By: Magnus Lie Hetland (mlh) Date: 2002-12-30 03:11 Message: Logged In: YES user_id=20535 New patch for configure.in (added a comment) and a patch for readline.c that uses HAVE_RL_COMPLETION_APPEND_CHARACTER. Tested on Gentoo Linux with new readline (the new completion behaviour was preserved) and on Solaris with old readline (now compiles, with old completion behaviour in place). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=659834&group_id=5470 From noreply@sourceforge.net Sun Mar 30 01:32:35 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sat, 29 Mar 2003 17:32:35 -0800 Subject: [Patches] [ python-Patches-711861 ] Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Message-ID: Patches item #711861, was opened at 2003-03-29 12:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed Resolution: Later Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Raymond Hettinger (rhettinger) >Summary: Replace LOAD_GLOBAL "None" with LOAD_CONST Py_None Initial Comment: Okay, here's one __builtin__ that's guaranteed not to change or be shadowed. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 16:19 Message: Logged In: YES user_id=80475 Hmm, in Py2.3a2+, it only gives a warning. Putting this one on hold until I can find out why it was safe for the compiler to return a None constant at the end of a function. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 13:05 Message: Logged In: YES user_id=80475 Python 2.3a2 (#39, Feb 19 2003, 17:58:58) [MSC v.1200 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> None = 1 SyntaxError: assignment to None (, line 1) In addition, the compiler already makes this assumption elsewhere. Every function ends with: 2 0 LOAD_CONST 0 (None) 3 RETURN_VALUE ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 12:37 Message: Logged In: YES user_id=21627 Where do you get this guarantee from? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711861&group_id=5470 From noreply@sourceforge.net Sun Mar 30 10:40:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 02:40:09 -0800 Subject: [Patches] [ python-Patches-712124 ] Obsolete comment in urlparse.py Message-ID: Patches item #712124, was opened at 2003-03-30 03:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712124&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Steven Taschuk (staschuk) Assigned to: Nobody/Anonymous (nobody) Summary: Obsolete comment in urlparse.py Initial Comment: urlparse.py contains a comment to the effect that urljoin('http://foo/bar', '//g') returns 'http://g/', contrary to the RFC 1808 example, which calls for 'http://g' (with no trailing slash). But this is false, and has been since at least 2.2.2; urljoin correctly returns 'http://g' in this case, as the test suite in fact verifies. The patch simply removes this bogus comment. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712124&group_id=5470 From noreply@sourceforge.net Sun Mar 30 14:54:36 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 06:54:36 -0800 Subject: [Patches] [ python-Patches-545300 ] sgmllib support for additional tag forms Message-ID: Patches item #545300, was opened at 2002-04-17 20:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=545300&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Steven F. Lott (slott56) Assigned to: Martin v. Löwis (loewis) Summary: sgmllib support for additional tag forms Initial Comment: MS-word generated HTML includes declaration tags of the form:   scattered throughout the body of an HTML document. The current sgmllib parse_declaration routine rejects these as invalid syntax, where browsers tolerate these embedded declarations. This patch accepts these declaration forms. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 16:54 Message: Logged In: YES user_id=21627 Thanks for the patch, I have installed it as markupbase.py 1.7 sgmllib.py 1.43 test_htmllib.py 1.3 NEWS 1.706 This also fixes bugs 505747 and 704996. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-11-22 10:23 Message: Logged In: YES user_id=21627 I now recommend to approve this patch. It improves SGML correctness, and, while supporting an MS extension, explicitly points out that it is doing so. ---------------------------------------------------------------------- Comment By: Steven F. Lott (slott56) Date: 2002-04-22 20:50 Message: Logged In: YES user_id=328067 My suggestion for handling this MS extension syntax is to (1) tolerate the extension without an error, (2) treat it as an SGML marked section, using the unknown_decl() call-back. Since this is a separate function, subclasses can override to alter this behavior. The content hidden in these MS-specific marked section appears to always be a  . While it might be expedient to completly skip over this junk, it makes it difficult to handle marked sections in a future version of markupbase. Attached is a revised patch against V1.39 of sgmllib.py and 1.4 of markupbase.py ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-04-21 17:11 Message: Logged In: YES user_id=3066 This is the same as bug #505747. These "tags" are not legal HTML in any form, but are some Microsoft invention. It's not entirely clear what the right thing to do is, but it is clear that we need to deal with these in some different way. Changed group to indicate that such changes can only go into the trunk; feature changes in maintenance versions are not allowed. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-04-18 19:23 Message: Logged In: YES user_id=21627 That patch looks wrong: You are changing what a tag is, removing the underscore, however, underscores are allowed in tag names. Also, could you please generate the patch against the CVS version of the code? Your patch doesn't apply for the current code, which has changed significantly compared to the version you appear to be using. There is no way that this can go into 2.1 IMO. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=545300&group_id=5470 From noreply@sourceforge.net Sun Mar 30 14:59:18 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 06:59:18 -0800 Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass Message-ID: Patches item #536883, was opened at 2002-03-29 20:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Martin v. Löwis (loewis) Summary: SimpleXMLRPCServer auto-docing subclass Initial Comment: This SimpleXMLRPCServer subclass automatically serves HTML documentation, generated using pydoc, in response to an HTTP GET request (XML-RPC always uses POST). Here are some examples: http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 16:59 Message: Logged In: YES user_id=21627 I'm not sure how to place this. Is this an extension to pydoc? Should it go into Tools, or into Lib, or into some existing module? If this goes into Lib somewhere, it lacks documentation. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-02-10 21:25 Message: Logged In: YES user_id=108973 Patch 473586 has been accepted so this patch can be accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 21:26 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 19:55 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 19:31 Message: Logged In: YES user_id=6380 Looks cute to me. Fredrik, any problem if I just check this in? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:34:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:34:17 -0800 Subject: [Patches] [ python-Patches-701743 ] Reloading pseudo modules Message-ID: Patches item #701743, was opened at 2003-03-11 19:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Reloading pseudo modules Initial Comment: Python allows to put something that is not a module in sys.modules. Unfortunately reload() does not work wth such a pseudo module ("TypeError: reload() argument must be module" is raised). This patch changes Python/import.c::PyImport_ReloadModule() so that it works with anything that has a __name__ attribute that can be found in sys.modules.keys(). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:34 Message: Logged In: YES user_id=21627 The patch looks fine now as far as it goes. I'm unsure what the use case is, though: What object do you have in sys.modules for which reload() would be meaningful? Can you attach an example where reloading fails now but succeeds with your patch applied? As for reload modifying the module object: It needs to, or else all clients would have to run reload; this would include things like function default arguments. I guess it returns a result for historical reasons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-17 15:25 Message: Logged In: YES user_id=89016 PyImport_ReloadModule() is only called by the implementation of the reload builtin, so it seems that m==NULL can only happen with broken extension modules. I've updated the patch accordingly (raising a SystemError) and changed the error case for a missing __name__ attribute to raise a TypeError when an AttributeError is detected. Unfortunately this might mask exceptions (e.g. when __name__ is implemented as a property.) Another problem is that reload() seems to repopulate the existing module object when reloading real modules. Example: Write a simple foo.py which contains "x = 1" and then: >>> import foo >>> foo.x 1 [ Now open your editor and change foo.py to "x = 2" ] >>> foo2 = reload(foo) >>> foo.x 2 >>> foo2.x 2 >>> print id(foo), id(foo2) 1077466884 1077466884 >>> Of course this can't work with pseudo modules. I wonder why reload() has a return value at all, as it always modifies its parameter for real modules. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-15 14:51 Message: Logged In: YES user_id=21627 I think the exceptions need to be reworked: "must be a module" now only occurs if m is NULL. Under what circumstances could that happen? Failure to provide __name__ is passed through; shouldn't this get diagnosed in a better way? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:42:40 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:42:40 -0800 Subject: [Patches] [ python-Patches-712124 ] Obsolete comment in urlparse.py Message-ID: Patches item #712124, was opened at 2003-03-30 12:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712124&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Steven Taschuk (staschuk) Assigned to: Nobody/Anonymous (nobody) Summary: Obsolete comment in urlparse.py Initial Comment: urlparse.py contains a comment to the effect that urljoin('http://foo/bar', '//g') returns 'http://g/', contrary to the RFC 1808 example, which calls for 'http://g' (with no trailing slash). But this is false, and has been since at least 2.2.2; urljoin correctly returns 'http://g' in this case, as the test suite in fact verifies. The patch simply removes this bogus comment. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:42 Message: Logged In: YES user_id=21627 Thanks for the patch, committed as 1.40. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712124&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:43:57 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:43:57 -0800 Subject: [Patches] [ python-Patches-711838 ] urllib2 doesn't support non-anonymous ftp Message-ID: Patches item #711838, was opened at 2003-03-29 17:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711838&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2 doesn't support non-anonymous ftp Initial Comment: urllib2 doesn't support non-anonymous ftp. Added support based on how urllib did it. More details about this bug in Red Hat's bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=78168 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=80676 ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:43 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. In addition, even if you *did* check this checkbox, a bug in SourceForge prevents attaching a file when *creating* an issue. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711838&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:49:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:49:15 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 17:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) >Assigned to: Tim Peters (tim_one) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:51:57 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:51:57 -0800 Subject: [Patches] [ python-Patches-708604 ] unchecked return values - compile.c Message-ID: Patches item #708604, was opened at 2003-03-24 04:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jason Harper (jasonharper) >Assigned to: Martin v. Löwis (loewis) Summary: unchecked return values - compile.c Initial Comment: Various cleanups in Python/compile.c - mainly unchecked return values. Also an unchecked memory allocation in PyList_SetSlice that's called by compile.c. ---------------------------------------------------------------------- Comment By: Jason Harper (jasonharper) Date: 2003-03-24 04:19 Message: Logged In: YES user_id=392021 aaarrrrggghhh.... SF isn't letting me attach the files, clicking Submit simply clears the entered filename??? Will try later from another system. ---------------------------------------------------------------------- Comment By: Jason Harper (jasonharper) Date: 2003-03-24 04:18 Message: Logged In: YES user_id=392021 aaarrrrggghhh.... SF isn't letting me attach the files, clicking Submit simply clears the entered filename??? Will try later from another system. ---------------------------------------------------------------------- Comment By: Jason Harper (jasonharper) Date: 2003-03-24 04:05 Message: Logged In: YES user_id=392021 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=708604&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:55:10 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:55:10 -0800 Subject: [Patches] [ python-Patches-701395 ] Wrong prototype for PyUnicode_Splitlines on documentation Message-ID: Patches item #701395, was opened at 2003-03-11 09:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701395&group_id=5470 Category: Documentation Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Hye-Shik Chang (perky) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Wrong prototype for PyUnicode_Splitlines on documentation Initial Comment: A mismatch of prototype and description between documentation and implementation. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:55 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as concrete.tex 1.22. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-11 10:18 Message: Logged In: YES user_id=38388 Looks good. Assigned to Fred. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701395&group_id=5470 From noreply@sourceforge.net Sun Mar 30 16:56:34 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 08:56:34 -0800 Subject: [Patches] [ python-Patches-684981 ] fix for bug 501716 Message-ID: Patches item #684981, was opened at 2003-02-12 00:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=684981&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Michael Stone (mbrierst) >Assigned to: Martin v. Löwis (loewis) Summary: fix for bug 501716 Initial Comment: Fixes bug described there: "es#" parser marker leaks memory Also fixes two other minor leaks involving strings with encoded NULL's and when a bad buffer_len pointer is passed to PyArg_Parse... Is a nicer version of the patch I pasted in to the comments on the 501716 bug report. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=684981&group_id=5470 From noreply@sourceforge.net Sun Mar 30 17:15:37 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 09:15:37 -0800 Subject: [Patches] [ python-Patches-695250 ] fix for bug 672614 :) Message-ID: Patches item #695250, was opened at 2003-02-28 20:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695250&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Michael Stone (mbrierst) Assigned to: Nobody/Anonymous (nobody) Summary: fix for bug 672614 :) Initial Comment: python -S shouldn't show COPYRIGHT string as they are not available. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 19:15 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied (in modified form) as main.c 1.76 and 1.61.6.3. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=695250&group_id=5470 From noreply@sourceforge.net Sun Mar 30 17:24:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 09:24:17 -0800 Subject: [Patches] [ python-Patches-672053 ] Py_Main() removal of exit() calls. Return value instead Message-ID: Patches item #672053, was opened at 2003-01-21 22:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=672053&group_id=5470 Category: Modules Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Douglas Napoleone (derivin) Assigned to: Nobody/Anonymous (nobody) Summary: Py_Main() removal of exit() calls. Return value instead Initial Comment: Py_Main() does not perform to spec. The C/API documentation notes that the function will return a value of 2 for imporper commandline values. Instead it calls exit() calling exit() in general is bad. The caller should be the one to call exit or return from main() with the supplied exit code. this is particularly troublesome when there are end cleanup calls that need to be made before terminating the program and static destruction is not an option. The patch just replaces the exit calls with a return. Calls to usage() have their return value returned. very streight forward ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 19:24 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as main.c 1.77. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=672053&group_id=5470 From noreply@sourceforge.net Sun Mar 30 17:25:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 09:25:55 -0800 Subject: [Patches] [ python-Patches-662464 ] 659188: no docs for HTMLParser Message-ID: Patches item #662464, was opened at 2003-01-05 05:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christopher Blunck (blunck2) Assigned to: Nobody/Anonymous (nobody) Summary: 659188: no docs for HTMLParser Initial Comment: Added some high level docs to explain how to use the class. Provided docstrings for the handle_* callback methods. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 19:25 Message: Logged In: YES user_id=21627 Christopher, can you please indicate whether you are going to provide a patch for the primary source of the documentation, i.e. the TeX files? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-15 13:20 Message: Logged In: YES user_id=21627 Can you please provide a patch for the Tex documentation (Doc/lib/libhtmlparser.tex) as well? I think this is where the submitter of bug 659188 was looking. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 From noreply@sourceforge.net Sun Mar 30 17:38:16 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 09:38:16 -0800 Subject: [Patches] [ python-Patches-650412 ] posixfy some things Message-ID: Patches item #650412, was opened at 2002-12-08 13:48 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=650412&group_id=5470 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Marc Recht (marc) Assigned to: Nobody/Anonymous (nobody) Summary: posixfy some things Initial Comment: Add special check for flock, since it isn't a POSIX function. This avoids a implicit declaration on FreeBSD 5. (It's present in the libc, but undefined because of POSIX_C_SOURCE.) Add a new check for getpagesize. It isn't a POSIX function either and needs the same treatment as flock. Changed resources.c so it uses getpagesize only if it's available. Else it tries to use sysconf. It none of the two is available it returns 0. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 19:38 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as configure 1.388 configure.in 1.399 pyconfig.h.in 1.76 resource.c 2.30 ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-12 14:27 Message: Logged In: YES user_id=21627 Please check the checkbox. ---------------------------------------------------------------------- Comment By: Marc Recht (marc) Date: 2002-12-12 14:07 Message: Logged In: YES user_id=205 - changed to elif - single patch (-p0) ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-08 18:23 Message: Logged In: YES user_id=21627 Does the resulting resource.c actually compile? It seems to be missing an #endif. Please use #elif instead. Please provide a single patch file, which can be applied with patch -p0. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=650412&group_id=5470 From noreply@sourceforge.net Sun Mar 30 17:41:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 09:41:55 -0800 Subject: [Patches] [ python-Patches-706590 ] Adds Mock Object support to unittest.TestCase Message-ID: Patches item #706590, was opened at 2003-03-19 23:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Out of Date Priority: 5 Submitted By: Matthew Russell (mattruss) Assigned to: Nobody/Anonymous (nobody) Summary: Adds Mock Object support to unittest.TestCase Initial Comment: Mock objects can greatly improve unittests (If used in the correct context), especially for code that relis upon resource hungry test (connections to databases, socket servers etc). The module/patch (to unittest) which I am submitting helps to introspect calls to code whilst maintaing transparency and funcionality with your code. I had previously written a similar module for my present employers, and myself and fellow XP partners agree that it has made the XP testing cycle consderably easier. Having googol-ed-out alternatives on the web, I have not found a solution that provides the same level of flexibility. (hope that doesn't sound arrogant) The tests for this module should highlight usage, but i will supply dummy code if this idea is accepted. If unfamiliar with XP/MockObject ideas, please see : http://www.xprogramming.com/xpmag/virtualMockObject s.htm#N78 ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 19:41 Message: Logged In: YES user_id=21627 This is now in feature request #708125. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706590&group_id=5470 From noreply@sourceforge.net Sun Mar 30 18:04:24 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 10:04:24 -0800 Subject: [Patches] [ python-Patches-662464 ] 659188: no docs for HTMLParser Message-ID: Patches item #662464, was opened at 2003-01-04 23:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christopher Blunck (blunck2) Assigned to: Nobody/Anonymous (nobody) Summary: 659188: no docs for HTMLParser Initial Comment: Added some high level docs to explain how to use the class. Provided docstrings for the handle_* callback methods. ---------------------------------------------------------------------- >Comment By: Christopher Blunck (blunck2) Date: 2003-03-30 13:04 Message: Logged In: YES user_id=531881 Sure. I'll patch and post it later on today. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 12:25 Message: Logged In: YES user_id=21627 Christopher, can you please indicate whether you are going to provide a patch for the primary source of the documentation, i.e. the TeX files? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-15 07:20 Message: Logged In: YES user_id=21627 Can you please provide a patch for the Tex documentation (Doc/lib/libhtmlparser.tex) as well? I think this is where the submitter of bug 659188 was looking. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 From noreply@sourceforge.net Sun Mar 30 19:02:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 11:02:39 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-30 14:02 Message: Logged In: YES user_id=6380 That prohibition isn't agreed yet, and would be new. Since this *is* a change in existing semantics and rule, there would have to be a period where the old semantics were maintained but a warning was given about violating the new rule. Your patch doesn't do any of that. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 12:11 Message: Logged In: YES user_id=80475 Arghh, I don't see what the problem is. The co_names cache variable is private and not part of the public interface for code objects. The only way to see a change in behavior is for a program to violate the prohibition of sticking a name in another module's globals that affects a builtin (and, even then, it would have to occur between calls the the function). Normal shadowing (using globals) would continue to work just fine. While it gives only a minor timing gain, the big win would be removing the incentive to create python code like this: def f(x, y, int=int, True=True, chr=chr): . . . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-29 06:37 Message: Logged In: YES user_id=6380 -1. It changes semantics in an ad-hoc way. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Sun Mar 30 19:42:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 11:42:31 -0800 Subject: [Patches] [ python-Patches-659834 ] Check for readline 2.2 features Message-ID: Patches item #659834, was opened at 2002-12-29 20:22 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=659834&group_id=5470 Category: Build Group: Python 2.2.x Status: Closed Resolution: Accepted Priority: 5 Submitted By: Magnus Lie Hetland (mlh) Assigned to: Neal Norwitz (nnorwitz) Summary: Check for readline 2.2 features Initial Comment: This patch adds a snippet to configure.in, to check whether rl_completion_append_character (which is used in Python 2.3) is available. rl_prep_terminal is assumed to co-exist with rl_completion_append_character. It is assumed that HAVE_RL_COMPLETION_APPEND_CHARACTER will be used in readline.c to make it compatible with older versions of the readline library. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 14:42 Message: Logged In: YES user_id=33168 Hmmm, I didn't realize I had to add HAVE_RL_COMPLETION_APPEND_CHARACTER manually. Checked in as: pyconfig.h.in 1.20.8.3 ---------------------------------------------------------------------- Comment By: Magnus Lie Hetland (mlh) Date: 2003-03-29 20:13 Message: Logged In: YES user_id=20535 I've now tested it with 2.2.3 (using the 2.2 maintenance branch, which had the revision numbers you cited) and it works nicely. That is, my old readline (readline 2.2, I think, although I couldn't find the version number this time around -- at least it doesn't have the completer character functionality) works. There is one thing I find a bit odd, though... With the 2.3 version of this check, the following ends up in pyconfig.h: /* Define if you have readline 2.2 */ /* #undef HAVE_RL_COMPLETION_APPEND_CHARACTER */ However, it isn't there when I use the 2.2 branch version. I guess it shouldn't matter either way (it's uncommented anyway), but it seems that the two versions behave differently, though... But since it all works, it's a bit hard to find out what's "wrong", if anything... Anyway, the (tentative) verdict from me is that it works. And just a final note: This check is really sort of a "band aid" solution, since the behaviour of the completer will differ, based on which readline version you have. Making the default the same for readline 2.2 and readline 4.* and making it configurable from Python for the newer versions might be better... Although possibly not important enough to warrant the work. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-29 17:40 Message: Logged In: YES user_id=33168 Magnus, it would be great if you could test 2.2.3 from CVS too. I have checked in a change that builds and works with newer versions of readline. I don't have readline v2.2. Checked in as: * configure 1.279.6.19 * configure.in 1.288.6.19 * Modules/readline.c 2.41.6.7 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-02-12 17:16 Message: Logged In: YES user_id=6380 I need a volunteer to backport this to 2.2 who can run an older version of autoconf; the autoconf that I have installed is too new to process the 2.2 configure.in file. (The 2.3 version is already checked in.) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-12-30 11:27 Message: Logged In: YES user_id=6380 Checked in. Thanks! Hm, this should be backported to 2.2.3 too! So I'll keep it open. ---------------------------------------------------------------------- Comment By: Magnus Lie Hetland (mlh) Date: 2002-12-29 21:11 Message: Logged In: YES user_id=20535 New patch for configure.in (added a comment) and a patch for readline.c that uses HAVE_RL_COMPLETION_APPEND_CHARACTER. Tested on Gentoo Linux with new readline (the new completion behaviour was preserved) and on Solaris with old readline (now compiles, with old completion behaviour in place). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=659834&group_id=5470 From noreply@sourceforge.net Sun Mar 30 20:16:56 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 12:16:56 -0800 Subject: [Patches] [ python-Patches-712317 ] Bug fix 548176: urlparse('http://foo?blah') errs Message-ID: Patches item #712317, was opened at 2003-03-30 13:16 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712317&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Steven Taschuk (staschuk) Assigned to: Nobody/Anonymous (nobody) Summary: Bug fix 548176: urlparse('http://foo?blah') errs Initial Comment: For detailed description of the problem, see http://www.python.org/sf/548176 In summary, URLs such as http://www.example.com?query=spam are misparsed by urlparse.urlparse, which decides that everything after the '//' is the host name. This is contrary to RFC 2396 and probably contrary to the intent of RFC 1738. The patch corrects the problem, adds a test to expose it, and rearranges some of the tests to better exercise the code in question. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712317&group_id=5470 From noreply@sourceforge.net Sun Mar 30 20:45:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 12:45:04 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 03:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Geert Jansen (geertj) >Assigned to: Martin v. Löwis (loewis) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-30 15:45 Message: Logged In: YES user_id=6380 (c) is okay with me. Since PyInterpreterState is always allocated by the Python core, I can't see how this could possibly break something. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 09:16 Message: Logged In: YES user_id=21627 This patch breaks binary compatibility, as it changes the layout of PyInterpreterState. We could reduce the risk of breakage by moving the new members at the end of the struct. Assigning to Guido for pronouncement: Should this a) be rejected? b) be accepted as is? (arguing that nobody uses the interpreter state, anyway) c) accepted with the proposed change (i.e. sizeof(PyInterpreterState) still changes, but the offset of the existing members doesn't). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-28 03:40 Message: Logged In: YES user_id=38388 Looks ok. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 19:00 Message: Logged In: YES user_id=21627 Marc-Andre, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2003-03-27 03:25 Message: Logged In: YES user_id=537938 Here is the patch. It is tested and verified to fix the problem by two people. I also verified that it passes the test suite. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Sun Mar 30 20:47:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 12:47:17 -0800 Subject: [Patches] [ python-Patches-706338 ] Fix a few broken links in pydoc Message-ID: Patches item #706338, was opened at 2003-03-19 10:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706338&group_id=5470 Category: Documentation Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Greg Chapman (glchapman) >Assigned to: Neal Norwitz (nnorwitz) Summary: Fix a few broken links in pydoc Initial Comment: Patch to fix a few of the help files references in pydoc.Helper. I'm not sure what was originally in 'ref/execframe' (which does not exist in the 2.3 documentation set), but, since 'ref/naming' seems the best file for NAMESPACES, I converted both references to 'ref/execframe' to 'ref/naming'. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 15:47 Message: Logged In: YES user_id=33168 Thanks! Checked in as: Lib/pydoc.py 1.81 and 1.56.8.9 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=706338&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:12:10 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:12:10 -0800 Subject: [Patches] [ python-Patches-710576 ] Backport to 2.2.2 of codec registry fix Message-ID: Patches item #710576, was opened at 2003-03-27 09:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 Category: Core (C code) Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Martin v. Löwis (loewis) Summary: Backport to 2.2.2 of codec registry fix Initial Comment: Hi, attached is a backport to Python 2.2.2 of the patch that fixes bug: #663074: codec registry and Python embedding problem which is discussed here: http://sourceforge.net/tracker/index.php?func=detail&aid=663074&group_id=5470&atid=105470 If there will be a Python 2.2.3 release, I suggest this patch is applied. Currently, mod_python programs cannot use encodings, because mod_python is one of the (few?) programs that uses multiple subinterpreters. About the patch: it is a backport of Gustavo Niemeyer's patch for 2.3 CVS. I had to adapt it a little bit because in 2.2 there is no codec error registry. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:12 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed (with changes) as pystate.h 2.18.16.3 NEWS 1.337.2.4.2.69 codecs.c 2.13.26.3 pystate.c 2.20.16.3 pythonrun.c 2.153.6.5 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-30 22:45 Message: Logged In: YES user_id=6380 (c) is okay with me. Since PyInterpreterState is always allocated by the Python core, I can't see how this could possibly break something. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 15:16 Message: Logged In: YES user_id=21627 This patch breaks binary compatibility, as it changes the layout of PyInterpreterState. We could reduce the risk of breakage by moving the new members at the end of the struct. Assigning to Guido for pronouncement: Should this a) be rejected? b) be accepted as is? (arguing that nobody uses the interpreter state, anyway) c) accepted with the proposed change (i.e. sizeof(PyInterpreterState) still changes, but the offset of the existing members doesn't). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-03-28 09:40 Message: Logged In: YES user_id=38388 Looks ok. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:00 Message: Logged In: YES user_id=21627 Marc-Andre, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- Comment By: Geert Jansen (geertj) Date: 2003-03-27 09:25 Message: Logged In: YES user_id=537938 Here is the patch. It is tested and verified to fix the problem by two people. I also verified that it passes the test suite. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710576&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:15:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:15:04 -0800 Subject: [Patches] [ python-Patches-658316 ] skips.txt for regrtest.py Message-ID: Patches item #658316, was opened at 2002-12-24 21:03 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658316&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) >Assigned to: Raymond Hettinger (rhettinger) Summary: skips.txt for regrtest.py Initial Comment: As I promised on python-dev here is the functionality to have a skips.txt file for regrtest.py. If the file is present in the current directory it is parsed (using the exact same code as used for the -f option for regrtest; good, old copy-n-paste) and all tests are added to the expected skip set. And as commented in the file, the name of the file is so named after Skip Montanaro because he is too shy. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:15 Message: Logged In: YES user_id=21627 Raymond, any further comments? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-12-30 02:37 Message: Logged In: YES user_id=357491 Oops. =) New diff includes a paragraph at the end of the module documentation that mentions how to use the new functionality. Please delete the old diff. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-12-29 06:09 Message: Logged In: YES user_id=80475 The patch looks good. Now, it needs documentation. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-12-26 22:04 Message: Logged In: YES user_id=357491 Sorry about that! I could have sworn I checked the box. I have uploaded enough files here you would think it would be habitual by now. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-12-26 19:10 Message: Logged In: YES user_id=33168 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658316&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:16:42 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:16:42 -0800 Subject: [Patches] [ python-Patches-649997 ] Complementary patch for OpenVMS Message-ID: Patches item #649997, was opened at 2002-12-07 12:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=649997&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Piéronne Jean-François (pieronne) >Assigned to: Martin v. Löwis (loewis) Summary: Complementary patch for OpenVMS Initial Comment: Hi, I have attach the complementary patch for OpenVMS As the previous one, all the update use conditionnal compilation for VMS, except in two place: There is for socketmodule.c two update which use ENABLE_IPV6 test but not __VMS because I think it was a bug into the initial code, there is a use of "sockaddr_storage" which is if I remember correctly a IPV6 structure. Regards, Jean-François Piéronne ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:16 Message: Logged In: YES user_id=21627 Am I correct assuming that this patch has been superceded now by 708495? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-08 11:34 Message: Logged In: YES user_id=21627 I have a number of questions: - What is RMS? (probably not Richard M Stallman :-) RMSError is not used, so it should not be included in the patch. - Why do you need to omit the argument for F_GETFD? - Why do you cast the ioctl argument to void*? In POSIX, this argument is of type int. - What is the third argument to getcwd? - Don't use nested #ifs, use #elif instead where appropriate. - ---------------------------------------------------------------------- Comment By: Piéronne Jean-François (pieronne) Date: 2002-12-08 11:00 Message: Logged In: YES user_id=414701 Done Thanks ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-12-08 09:49 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=649997&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:18:38 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:18:38 -0800 Subject: [Patches] [ python-Patches-553171 ] optionally make shelve less surprising Message-ID: Patches item #553171, was opened at 2002-05-07 10:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: optionally make shelve less surprising Initial Comment: shelve has highly surprising behavior wrt modifiable values: s = shelve.open('she.dat','c') s['ciao'] = range(3) s['ciao'].append(4) # doesn't "TAKE"! Explaining to beginners that s['ciao'] is returning a temporary object and the modification is done on the temporary thus "silently ignored" is hard indeed. It also makes shelve far less convenient than it could be (whenever modifiable values must be shelved). Having s keep track of all values it has returned may perhaps break some existing program (due to extra memory consumption and/or to lack of "implicit copy"/"snapshot" behavior) so I've made the 'caching' change optional and by default off. However it's now at least possible to obtain nonsurprising behavior: s = shelve.open('she.dat','c',smart=1) s['ciao'] = range(3) s['ciao'].append(4) # no surprises any more I suspect the 'smart=1' should be made the default, but, if we at least put it in now, then perhaps we can migrate to having it as the default very slowly and gradually. Alex ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:18 Message: Logged In: YES user_id=21627 Alex, do you still think this should be implemented, in some form or other? ---------------------------------------------------------------------- Comment By: Holger P. Krekel (dannu) Date: 2002-05-10 02:47 Message: Logged In: YES user_id=83092 I'd suggest not changing shelve at all but providing a "cache-commit" dictionary (ccdict) which can wrap a shelf-instance (or any other simple dictish instance) and provides the 'non-surprising' behaviour. Some proof of concept code for the following properties is provided here http://home.trillke.net/~hpk/ccdict.py Current properties are: - ccdict wraps a dictionary-like object which in turn only needs to provide __getitem__, __setitem__, __delitem__,has_key - on first access of an element ccdict makes a lookup on the underlying dict and caches the item. - the next accesses work with the cached thing. Unsurprising dict-semantics are provided. - deleting an item is deferred and actually happens on commit() time. deleting an item and later on assigning to it works as expected (i.e. the assignment takes preference). - commit() transfers the items in the cache to the underlying dict and clears the cache.Prior to issuing commit no writeback to the underlying dict happens. - deleting an ccdict-instance does *not* commit any changes. You have to explicitely call commit(). If you want to work readonly, don't call commit. - clear() only cleares the cache and not the underlying dict - you can explicitely prune the cache (via cache.keys() etc.) before calling commit(). This lets you avoid writing back unmodified objects if this is an issue. It seems quite impossible to figure out automagically which objects have been modified and so the solution is to do it explicitely (or don't commit for readonly). holger ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 22:55 Message: Logged In: YES user_id=80475 A few more thoughts: Please change the "except:" lines to specify the exception being caught. Also, if GvR shows interest in the patch, we should update the library reference and add unittests. The docstring should also mention that the cache is kept in memory -- besides persistence, one of the forces for shelving is memory conservation. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 20:43 Message: Logged In: YES user_id=80475 Nicely done! The code is clean and runs in the smart mode without problems on my existing programs. I agree that the patch solves a real world problem. The solution is clean, but a little expensive. If there were a way to be able to tell if an entry had been altered, it would save the 100% writeback. Unfortunately, I can't think of a way. The docstring could read more smoothly and plainly. Also, it should be clear that the cost of setting smart=1 is that 100% of the entries get rewritten on close. Two microscopically minor thoughts on the coding (feel free to disregard). Can some of the try/except blocks be replaced by something akin to 'if self.smart:'? For the writeback loop, consider 'for k,v in cache.iteritems()' as it takes less memory and saves a lookup. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-07 18:38 Message: Logged In: YES user_id=21627 Even more important than the backwards compatibility might be the issue that it writes back all accessed objects on close, which might be expensive if there have been many read-only accesses. So I think the option name could be also 'slow'; although 'writeback' might be more technical. Also, I wonder whether write-back should be attempted if the shelve was opened read-only. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:26:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:26:55 -0800 Subject: [Patches] [ python-Patches-553171 ] optionally make shelve less surprising Message-ID: Patches item #553171, was opened at 2002-05-07 10:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: optionally make shelve less surprising Initial Comment: shelve has highly surprising behavior wrt modifiable values: s = shelve.open('she.dat','c') s['ciao'] = range(3) s['ciao'].append(4) # doesn't "TAKE"! Explaining to beginners that s['ciao'] is returning a temporary object and the modification is done on the temporary thus "silently ignored" is hard indeed. It also makes shelve far less convenient than it could be (whenever modifiable values must be shelved). Having s keep track of all values it has returned may perhaps break some existing program (due to extra memory consumption and/or to lack of "implicit copy"/"snapshot" behavior) so I've made the 'caching' change optional and by default off. However it's now at least possible to obtain nonsurprising behavior: s = shelve.open('she.dat','c',smart=1) s['ciao'] = range(3) s['ciao'].append(4) # no surprises any more I suspect the 'smart=1' should be made the default, but, if we at least put it in now, then perhaps we can migrate to having it as the default very slowly and gradually. Alex ---------------------------------------------------------------------- >Comment By: Alex Martelli (aleax) Date: 2003-03-30 23:26 Message: Logged In: YES user_id=60314 Yes, Martin, I'm still quite convinced shelve's behavior is generally surprising and often problematic, and even though the fixed suggested by both me and dannu are each imperfect (given the impossibility to find out, in general, whether an object has been modified), I think one or the other would still be better than the current situation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:18 Message: Logged In: YES user_id=21627 Alex, do you still think this should be implemented, in some form or other? ---------------------------------------------------------------------- Comment By: Holger P. Krekel (dannu) Date: 2002-05-10 02:47 Message: Logged In: YES user_id=83092 I'd suggest not changing shelve at all but providing a "cache-commit" dictionary (ccdict) which can wrap a shelf-instance (or any other simple dictish instance) and provides the 'non-surprising' behaviour. Some proof of concept code for the following properties is provided here http://home.trillke.net/~hpk/ccdict.py Current properties are: - ccdict wraps a dictionary-like object which in turn only needs to provide __getitem__, __setitem__, __delitem__,has_key - on first access of an element ccdict makes a lookup on the underlying dict and caches the item. - the next accesses work with the cached thing. Unsurprising dict-semantics are provided. - deleting an item is deferred and actually happens on commit() time. deleting an item and later on assigning to it works as expected (i.e. the assignment takes preference). - commit() transfers the items in the cache to the underlying dict and clears the cache.Prior to issuing commit no writeback to the underlying dict happens. - deleting an ccdict-instance does *not* commit any changes. You have to explicitely call commit(). If you want to work readonly, don't call commit. - clear() only cleares the cache and not the underlying dict - you can explicitely prune the cache (via cache.keys() etc.) before calling commit(). This lets you avoid writing back unmodified objects if this is an issue. It seems quite impossible to figure out automagically which objects have been modified and so the solution is to do it explicitely (or don't commit for readonly). holger ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 22:55 Message: Logged In: YES user_id=80475 A few more thoughts: Please change the "except:" lines to specify the exception being caught. Also, if GvR shows interest in the patch, we should update the library reference and add unittests. The docstring should also mention that the cache is kept in memory -- besides persistence, one of the forces for shelving is memory conservation. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 20:43 Message: Logged In: YES user_id=80475 Nicely done! The code is clean and runs in the smart mode without problems on my existing programs. I agree that the patch solves a real world problem. The solution is clean, but a little expensive. If there were a way to be able to tell if an entry had been altered, it would save the 100% writeback. Unfortunately, I can't think of a way. The docstring could read more smoothly and plainly. Also, it should be clear that the cost of setting smart=1 is that 100% of the entries get rewritten on close. Two microscopically minor thoughts on the coding (feel free to disregard). Can some of the try/except blocks be replaced by something akin to 'if self.smart:'? For the writeback loop, consider 'for k,v in cache.iteritems()' as it takes less memory and saves a lookup. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-07 18:38 Message: Logged In: YES user_id=21627 Even more important than the backwards compatibility might be the issue that it writes back all accessed objects on close, which might be expensive if there have been many read-only accesses. So I think the option name could be also 'slow'; although 'writeback' might be more technical. Also, I wonder whether write-back should be attempted if the shelve was opened read-only. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:43:27 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:43:27 -0800 Subject: [Patches] [ python-Patches-553171 ] optionally make shelve less surprising Message-ID: Patches item #553171, was opened at 2002-05-07 10:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: optionally make shelve less surprising Initial Comment: shelve has highly surprising behavior wrt modifiable values: s = shelve.open('she.dat','c') s['ciao'] = range(3) s['ciao'].append(4) # doesn't "TAKE"! Explaining to beginners that s['ciao'] is returning a temporary object and the modification is done on the temporary thus "silently ignored" is hard indeed. It also makes shelve far less convenient than it could be (whenever modifiable values must be shelved). Having s keep track of all values it has returned may perhaps break some existing program (due to extra memory consumption and/or to lack of "implicit copy"/"snapshot" behavior) so I've made the 'caching' change optional and by default off. However it's now at least possible to obtain nonsurprising behavior: s = shelve.open('she.dat','c',smart=1) s['ciao'] = range(3) s['ciao'].append(4) # no surprises any more I suspect the 'smart=1' should be made the default, but, if we at least put it in now, then perhaps we can migrate to having it as the default very slowly and gradually. Alex ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:43 Message: Logged In: YES user_id=21627 Would you then be willing to provide a complete patch (documentation, NEWS entry, test case)? ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2003-03-30 23:26 Message: Logged In: YES user_id=60314 Yes, Martin, I'm still quite convinced shelve's behavior is generally surprising and often problematic, and even though the fixed suggested by both me and dannu are each imperfect (given the impossibility to find out, in general, whether an object has been modified), I think one or the other would still be better than the current situation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:18 Message: Logged In: YES user_id=21627 Alex, do you still think this should be implemented, in some form or other? ---------------------------------------------------------------------- Comment By: Holger P. Krekel (dannu) Date: 2002-05-10 02:47 Message: Logged In: YES user_id=83092 I'd suggest not changing shelve at all but providing a "cache-commit" dictionary (ccdict) which can wrap a shelf-instance (or any other simple dictish instance) and provides the 'non-surprising' behaviour. Some proof of concept code for the following properties is provided here http://home.trillke.net/~hpk/ccdict.py Current properties are: - ccdict wraps a dictionary-like object which in turn only needs to provide __getitem__, __setitem__, __delitem__,has_key - on first access of an element ccdict makes a lookup on the underlying dict and caches the item. - the next accesses work with the cached thing. Unsurprising dict-semantics are provided. - deleting an item is deferred and actually happens on commit() time. deleting an item and later on assigning to it works as expected (i.e. the assignment takes preference). - commit() transfers the items in the cache to the underlying dict and clears the cache.Prior to issuing commit no writeback to the underlying dict happens. - deleting an ccdict-instance does *not* commit any changes. You have to explicitely call commit(). If you want to work readonly, don't call commit. - clear() only cleares the cache and not the underlying dict - you can explicitely prune the cache (via cache.keys() etc.) before calling commit(). This lets you avoid writing back unmodified objects if this is an issue. It seems quite impossible to figure out automagically which objects have been modified and so the solution is to do it explicitely (or don't commit for readonly). holger ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 22:55 Message: Logged In: YES user_id=80475 A few more thoughts: Please change the "except:" lines to specify the exception being caught. Also, if GvR shows interest in the patch, we should update the library reference and add unittests. The docstring should also mention that the cache is kept in memory -- besides persistence, one of the forces for shelving is memory conservation. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 20:43 Message: Logged In: YES user_id=80475 Nicely done! The code is clean and runs in the smart mode without problems on my existing programs. I agree that the patch solves a real world problem. The solution is clean, but a little expensive. If there were a way to be able to tell if an entry had been altered, it would save the 100% writeback. Unfortunately, I can't think of a way. The docstring could read more smoothly and plainly. Also, it should be clear that the cost of setting smart=1 is that 100% of the entries get rewritten on close. Two microscopically minor thoughts on the coding (feel free to disregard). Can some of the try/except blocks be replaced by something akin to 'if self.smart:'? For the writeback loop, consider 'for k,v in cache.iteritems()' as it takes less memory and saves a lookup. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-07 18:38 Message: Logged In: YES user_id=21627 Even more important than the backwards compatibility might be the issue that it writes back all accessed objects on close, which might be expensive if there have been many read-only accesses. So I think the option name could be also 'slow'; although 'writeback' might be more technical. Also, I wonder whether write-back should be attempted if the shelve was opened read-only. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:56:07 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:56:07 -0800 Subject: [Patches] [ python-Patches-553171 ] optionally make shelve less surprising Message-ID: Patches item #553171, was opened at 2002-05-07 03:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: optionally make shelve less surprising Initial Comment: shelve has highly surprising behavior wrt modifiable values: s = shelve.open('she.dat','c') s['ciao'] = range(3) s['ciao'].append(4) # doesn't "TAKE"! Explaining to beginners that s['ciao'] is returning a temporary object and the modification is done on the temporary thus "silently ignored" is hard indeed. It also makes shelve far less convenient than it could be (whenever modifiable values must be shelved). Having s keep track of all values it has returned may perhaps break some existing program (due to extra memory consumption and/or to lack of "implicit copy"/"snapshot" behavior) so I've made the 'caching' change optional and by default off. However it's now at least possible to obtain nonsurprising behavior: s = shelve.open('she.dat','c',smart=1) s['ciao'] = range(3) s['ciao'].append(4) # no surprises any more I suspect the 'smart=1' should be made the default, but, if we at least put it in now, then perhaps we can migrate to having it as the default very slowly and gradually. Alex ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-30 16:56 Message: Logged In: YES user_id=80475 The issue has arisen a couple of times of comp.lang.python. I think this patch would be helpful. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 16:43 Message: Logged In: YES user_id=21627 Would you then be willing to provide a complete patch (documentation, NEWS entry, test case)? ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2003-03-30 16:26 Message: Logged In: YES user_id=60314 Yes, Martin, I'm still quite convinced shelve's behavior is generally surprising and often problematic, and even though the fixed suggested by both me and dannu are each imperfect (given the impossibility to find out, in general, whether an object has been modified), I think one or the other would still be better than the current situation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 16:18 Message: Logged In: YES user_id=21627 Alex, do you still think this should be implemented, in some form or other? ---------------------------------------------------------------------- Comment By: Holger P. Krekel (dannu) Date: 2002-05-09 19:47 Message: Logged In: YES user_id=83092 I'd suggest not changing shelve at all but providing a "cache-commit" dictionary (ccdict) which can wrap a shelf-instance (or any other simple dictish instance) and provides the 'non-surprising' behaviour. Some proof of concept code for the following properties is provided here http://home.trillke.net/~hpk/ccdict.py Current properties are: - ccdict wraps a dictionary-like object which in turn only needs to provide __getitem__, __setitem__, __delitem__,has_key - on first access of an element ccdict makes a lookup on the underlying dict and caches the item. - the next accesses work with the cached thing. Unsurprising dict-semantics are provided. - deleting an item is deferred and actually happens on commit() time. deleting an item and later on assigning to it works as expected (i.e. the assignment takes preference). - commit() transfers the items in the cache to the underlying dict and clears the cache.Prior to issuing commit no writeback to the underlying dict happens. - deleting an ccdict-instance does *not* commit any changes. You have to explicitely call commit(). If you want to work readonly, don't call commit. - clear() only cleares the cache and not the underlying dict - you can explicitely prune the cache (via cache.keys() etc.) before calling commit(). This lets you avoid writing back unmodified objects if this is an issue. It seems quite impossible to figure out automagically which objects have been modified and so the solution is to do it explicitely (or don't commit for readonly). holger ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 15:55 Message: Logged In: YES user_id=80475 A few more thoughts: Please change the "except:" lines to specify the exception being caught. Also, if GvR shows interest in the patch, we should update the library reference and add unittests. The docstring should also mention that the cache is kept in memory -- besides persistence, one of the forces for shelving is memory conservation. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 13:43 Message: Logged In: YES user_id=80475 Nicely done! The code is clean and runs in the smart mode without problems on my existing programs. I agree that the patch solves a real world problem. The solution is clean, but a little expensive. If there were a way to be able to tell if an entry had been altered, it would save the 100% writeback. Unfortunately, I can't think of a way. The docstring could read more smoothly and plainly. Also, it should be clear that the cost of setting smart=1 is that 100% of the entries get rewritten on close. Two microscopically minor thoughts on the coding (feel free to disregard). Can some of the try/except blocks be replaced by something akin to 'if self.smart:'? For the writeback loop, consider 'for k,v in cache.iteritems()' as it takes less memory and saves a lookup. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-07 11:38 Message: Logged In: YES user_id=21627 Even more important than the backwards compatibility might be the issue that it writes back all accessed objects on close, which might be expensive if there have been many read-only accesses. So I think the option name could be also 'slow'; although 'writeback' might be more technical. Also, I wonder whether write-back should be attempted if the shelve was opened read-only. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:59:09 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:59:09 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-30 16:59 Message: Logged In: YES user_id=80475 I see. Would this patch be acceptable as a -OO option or should I drop it? Also, the same question applies to a tiny patch converting LOAD_GLOBAL "None" --> LOAD_CONST Py_None ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-30 14:02 Message: Logged In: YES user_id=6380 That prohibition isn't agreed yet, and would be new. Since this *is* a change in existing semantics and rule, there would have to be a period where the old semantics were maintained but a warning was given about violating the new rule. Your patch doesn't do any of that. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 12:11 Message: Logged In: YES user_id=80475 Arghh, I don't see what the problem is. The co_names cache variable is private and not part of the public interface for code objects. The only way to see a change in behavior is for a program to violate the prohibition of sticking a name in another module's globals that affects a builtin (and, even then, it would have to occur between calls the the function). Normal shadowing (using globals) would continue to work just fine. While it gives only a minor timing gain, the big win would be removing the incentive to create python code like this: def f(x, y, int=int, True=True, chr=chr): . . . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-29 06:37 Message: Logged In: YES user_id=6380 -1. It changes semantics in an ad-hoc way. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Sun Mar 30 21:59:41 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 13:59:41 -0800 Subject: [Patches] [ python-Patches-553171 ] optionally make shelve less surprising Message-ID: Patches item #553171, was opened at 2002-05-07 10:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: optionally make shelve less surprising Initial Comment: shelve has highly surprising behavior wrt modifiable values: s = shelve.open('she.dat','c') s['ciao'] = range(3) s['ciao'].append(4) # doesn't "TAKE"! Explaining to beginners that s['ciao'] is returning a temporary object and the modification is done on the temporary thus "silently ignored" is hard indeed. It also makes shelve far less convenient than it could be (whenever modifiable values must be shelved). Having s keep track of all values it has returned may perhaps break some existing program (due to extra memory consumption and/or to lack of "implicit copy"/"snapshot" behavior) so I've made the 'caching' change optional and by default off. However it's now at least possible to obtain nonsurprising behavior: s = shelve.open('she.dat','c',smart=1) s['ciao'] = range(3) s['ciao'].append(4) # no surprises any more I suspect the 'smart=1' should be made the default, but, if we at least put it in now, then perhaps we can migrate to having it as the default very slowly and gradually. Alex ---------------------------------------------------------------------- >Comment By: Alex Martelli (aleax) Date: 2003-03-30 23:59 Message: Logged In: YES user_id=60314 sure, but along what lines -- my previous patch's, or dannu's? let me know, and I'll get to work on it as soon as I'm back from Python-UK & short following trip (i..e around Apr 12) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-30 23:56 Message: Logged In: YES user_id=80475 The issue has arisen a couple of times of comp.lang.python. I think this patch would be helpful. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:43 Message: Logged In: YES user_id=21627 Would you then be willing to provide a complete patch (documentation, NEWS entry, test case)? ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2003-03-30 23:26 Message: Logged In: YES user_id=60314 Yes, Martin, I'm still quite convinced shelve's behavior is generally surprising and often problematic, and even though the fixed suggested by both me and dannu are each imperfect (given the impossibility to find out, in general, whether an object has been modified), I think one or the other would still be better than the current situation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:18 Message: Logged In: YES user_id=21627 Alex, do you still think this should be implemented, in some form or other? ---------------------------------------------------------------------- Comment By: Holger P. Krekel (dannu) Date: 2002-05-10 02:47 Message: Logged In: YES user_id=83092 I'd suggest not changing shelve at all but providing a "cache-commit" dictionary (ccdict) which can wrap a shelf-instance (or any other simple dictish instance) and provides the 'non-surprising' behaviour. Some proof of concept code for the following properties is provided here http://home.trillke.net/~hpk/ccdict.py Current properties are: - ccdict wraps a dictionary-like object which in turn only needs to provide __getitem__, __setitem__, __delitem__,has_key - on first access of an element ccdict makes a lookup on the underlying dict and caches the item. - the next accesses work with the cached thing. Unsurprising dict-semantics are provided. - deleting an item is deferred and actually happens on commit() time. deleting an item and later on assigning to it works as expected (i.e. the assignment takes preference). - commit() transfers the items in the cache to the underlying dict and clears the cache.Prior to issuing commit no writeback to the underlying dict happens. - deleting an ccdict-instance does *not* commit any changes. You have to explicitely call commit(). If you want to work readonly, don't call commit. - clear() only cleares the cache and not the underlying dict - you can explicitely prune the cache (via cache.keys() etc.) before calling commit(). This lets you avoid writing back unmodified objects if this is an issue. It seems quite impossible to figure out automagically which objects have been modified and so the solution is to do it explicitely (or don't commit for readonly). holger ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 22:55 Message: Logged In: YES user_id=80475 A few more thoughts: Please change the "except:" lines to specify the exception being caught. Also, if GvR shows interest in the patch, we should update the library reference and add unittests. The docstring should also mention that the cache is kept in memory -- besides persistence, one of the forces for shelving is memory conservation. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 20:43 Message: Logged In: YES user_id=80475 Nicely done! The code is clean and runs in the smart mode without problems on my existing programs. I agree that the patch solves a real world problem. The solution is clean, but a little expensive. If there were a way to be able to tell if an entry had been altered, it would save the 100% writeback. Unfortunately, I can't think of a way. The docstring could read more smoothly and plainly. Also, it should be clear that the cost of setting smart=1 is that 100% of the entries get rewritten on close. Two microscopically minor thoughts on the coding (feel free to disregard). Can some of the try/except blocks be replaced by something akin to 'if self.smart:'? For the writeback loop, consider 'for k,v in cache.iteritems()' as it takes less memory and saves a lookup. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-07 18:38 Message: Logged In: YES user_id=21627 Even more important than the backwards compatibility might be the issue that it writes back all accessed objects on close, which might be expensive if there have been many read-only accesses. So I think the option name could be also 'slow'; although 'writeback' might be more technical. Also, I wonder whether write-back should be attempted if the shelve was opened read-only. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 From noreply@sourceforge.net Sun Mar 30 22:04:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 14:04:04 -0800 Subject: [Patches] [ python-Patches-667548 ] Add missing constants for IRIX al module Message-ID: Patches item #667548, was opened at 2003-01-13 22:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667548&group_id=5470 Category: Modules >Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Michael Pruett (mpruett) >Assigned to: Neal Norwitz (nnorwitz) Summary: Add missing constants for IRIX al module Initial Comment: The following Audio Library constants are not defined by the IRIX al module as of Python 2.2.2: AL_LOCKED AL_NULL_INTERFACE AL_OPTICAL_IF_TYPE AL_SMPTE272M_IF_TYPE The attached patch adds these constants. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 17:04 Message: Logged In: YES user_id=33168 Thanks! Checked in as Modules/almodule.c 1.38 ---------------------------------------------------------------------- Comment By: Michael Pruett (mpruett) Date: 2003-01-13 22:33 Message: Logged In: YES user_id=250621 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=667548&group_id=5470 From noreply@sourceforge.net Sun Mar 30 22:06:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 14:06:31 -0800 Subject: [Patches] [ python-Patches-711002 ] new test_urllib and patch for found urllib bug Message-ID: Patches item #711002, was opened at 2003-03-27 13:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711002&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: new test_urllib and patch for found urllib bug Initial Comment: Free time at PyCon led to me writing a new test_urllib (happy, Raymond? =). Since I have no guarantee that there would be a net connection (and didn't want to use it without user permission since I view using the 'network' resource as using sockets and not the Net) I wrote all tests using temporary files. And do this found a bug, sort of. The docs and doc string for urlretrieve() says the second value from the returned tuple should be None when a local file is passed as an argument. Well, it wasn't; it was returning an rfc2822.Message object like it does for remote files. So I patched it to match the docs. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2003-03-30 14:06 Message: Logged In: YES user_id=357491 I just noticed that Skip uploaded test_urllibnet.py to test timeouts by connecting to python.org . Is it okay to write tests that connect to the Net when the `network' resourse is enabled? If so then I can add network tests to test_urllib.py . Oh, and the beginning of the 2nd paragraph for my summary should have read "And I did find a bug, sort of" and not the mess of broken grammar rules as I initially typed it in. =) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711002&group_id=5470 From noreply@sourceforge.net Sun Mar 30 22:11:14 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 14:11:14 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 11:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) >Assigned to: Nobody/Anonymous (nobody) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-30 17:11 Message: Logged In: YES user_id=31435 Looks fine to me too. Since Python switched to using semaphores on Linux for 2.3, it's unclear that there's a system that uses the condvar code anymore. How will this get tested? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 11:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Sun Mar 30 22:19:36 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 14:19:36 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 11:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 17:19 Message: Logged In: YES user_id=33168 _POSIX_SEMAPHORES aren't used if HAVE_BROKEN_POSIX_SEMAPHORES is defined. This currently occurs on Solaris 8 (at least). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-30 17:11 Message: Logged In: YES user_id=31435 Looks fine to me too. Since Python switched to using semaphores on Linux for 2.3, it's unclear that there's a system that uses the condvar code anymore. How will this get tested? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 11:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Sun Mar 30 22:36:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 14:36:15 -0800 Subject: [Patches] [ python-Patches-712367 ] get build working on AIX Message-ID: Patches item #712367, was opened at 2003-03-30 17:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712367&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: get build working on AIX Initial Comment: Tested on AIX 4.3 and 5.1. I may have tested this on 4.2 a long time ago. Changes to configure and setup.py. The setup.py changes are build curses. The configure changes create the export file differently. I was told by Gary Hooks at IBM that the export file must have a period for AIX 4.2 and beyond for dynamically imported modules to work properly (call back into the interpreter). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712367&group_id=5470 From noreply@sourceforge.net Mon Mar 31 00:29:19 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 16:29:19 -0800 Subject: [Patches] [ python-Patches-711722 ] Cache lookup of __builtins__ Message-ID: Patches item #711722, was opened at 2003-03-29 01:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Nobody/Anonymous (nobody) Summary: Cache lookup of __builtins__ Initial Comment: Rather than perform a bytecode optimization of LOAD_GLOBALS, takes an alternative approach of caching the lookup of builtins. To be safe, it checks the cache only after trying a lookup in globals(). I can think of only one way to break this approach: run the function accessing a builtin, then poke a new value into the builtins module, and then re-run the function: def f(x): return oct(x) print f(20) __builtins__.oct = hex print f(20) # doesn't notice new def of oct() The gives about a 2% speed-up to average programs, 0% to programs that don't use builtins, and higher percentages to those with heavier use of builtins. The speedup is limited by 1) having to still check globals and 2) the relative insignificance of builtin access time in most programs. Still, it pretty much solves the problem of access time for builtins. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-30 19:29 Message: Logged In: YES user_id=6380 Please drop it. OO also doesn't change semantics except for __doc__ (which is a different kind of change). The LOAD_GLOBAL->LOAD_CONST patch is acceptable for 2.4 (though it will probably be done differently once None is a keyword). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-30 16:59 Message: Logged In: YES user_id=80475 I see. Would this patch be acceptable as a -OO option or should I drop it? Also, the same question applies to a tiny patch converting LOAD_GLOBAL "None" --> LOAD_CONST Py_None ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-30 14:02 Message: Logged In: YES user_id=6380 That prohibition isn't agreed yet, and would be new. Since this *is* a change in existing semantics and rule, there would have to be a period where the old semantics were maintained but a warning was given about violating the new rule. Your patch doesn't do any of that. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-29 12:11 Message: Logged In: YES user_id=80475 Arghh, I don't see what the problem is. The co_names cache variable is private and not part of the public interface for code objects. The only way to see a change in behavior is for a program to violate the prohibition of sticking a name in another module's globals that affects a builtin (and, even then, it would have to occur between calls the the function). Normal shadowing (using globals) would continue to work just fine. While it gives only a minor timing gain, the big win would be removing the incentive to create python code like this: def f(x, y, int=int, True=True, chr=chr): . . . ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2003-03-29 06:37 Message: Logged In: YES user_id=6380 -1. It changes semantics in an ad-hoc way. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711722&group_id=5470 From noreply@sourceforge.net Mon Mar 31 03:24:28 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 19:24:28 -0800 Subject: [Patches] [ python-Patches-662464 ] 659188: no docs for HTMLParser Message-ID: Patches item #662464, was opened at 2003-01-04 23:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christopher Blunck (blunck2) Assigned to: Nobody/Anonymous (nobody) Summary: 659188: no docs for HTMLParser Initial Comment: Added some high level docs to explain how to use the class. Provided docstrings for the handle_* callback methods. ---------------------------------------------------------------------- >Comment By: Christopher Blunck (blunck2) Date: 2003-03-30 22:24 Message: Logged In: YES user_id=531881 added documentation for handle_pi callback method in libhtmlparser.tex ---------------------------------------------------------------------- Comment By: Christopher Blunck (blunck2) Date: 2003-03-30 13:04 Message: Logged In: YES user_id=531881 Sure. I'll patch and post it later on today. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 12:25 Message: Logged In: YES user_id=21627 Christopher, can you please indicate whether you are going to provide a patch for the primary source of the documentation, i.e. the TeX files? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-15 07:20 Message: Logged In: YES user_id=21627 Can you please provide a patch for the Tex documentation (Doc/lib/libhtmlparser.tex) as well? I think this is where the submitter of bug 659188 was looking. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 From noreply@sourceforge.net Mon Mar 31 03:59:55 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 19:59:55 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 11:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Mihai Ibanescu (misa) Date: 2003-03-30 22:59 Message: Logged In: YES user_id=205865 Also, this happens in 2.2.2 as well (the patch in Red Hat's bugzilla is against 2.2.2 actually). Is there a plan to release a 2.2.3? Is there value in backporting the patch? (should apply cleanly on 2.2.2). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 17:19 Message: Logged In: YES user_id=33168 _POSIX_SEMAPHORES aren't used if HAVE_BROKEN_POSIX_SEMAPHORES is defined. This currently occurs on Solaris 8 (at least). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-30 17:11 Message: Logged In: YES user_id=31435 Looks fine to me too. Since Python switched to using semaphores on Linux for 2.3, it's unclear that there's a system that uses the condvar code anymore. How will this get tested? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 11:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Mon Mar 31 04:01:30 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 20:01:30 -0800 Subject: [Patches] [ python-Patches-711838 ] urllib2 doesn't support non-anonymous ftp Message-ID: Patches item #711838, was opened at 2003-03-29 11:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711838&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2 doesn't support non-anonymous ftp Initial Comment: urllib2 doesn't support non-anonymous ftp. Added support based on how urllib did it. More details about this bug in Red Hat's bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=78168 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=80676 ---------------------------------------------------------------------- >Comment By: Mihai Ibanescu (misa) Date: 2003-03-30 23:01 Message: Logged In: YES user_id=205865 Argh. I forgot to check the checkbox. Here we go. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 11:43 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. In addition, even if you *did* check this checkbox, a bug in SourceForge prevents attaching a file when *creating* an issue. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711838&group_id=5470 From noreply@sourceforge.net Mon Mar 31 05:15:17 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 21:15:17 -0800 Subject: [Patches] [ python-Patches-553171 ] optionally make shelve less surprising Message-ID: Patches item #553171, was opened at 2002-05-07 10:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: optionally make shelve less surprising Initial Comment: shelve has highly surprising behavior wrt modifiable values: s = shelve.open('she.dat','c') s['ciao'] = range(3) s['ciao'].append(4) # doesn't "TAKE"! Explaining to beginners that s['ciao'] is returning a temporary object and the modification is done on the temporary thus "silently ignored" is hard indeed. It also makes shelve far less convenient than it could be (whenever modifiable values must be shelved). Having s keep track of all values it has returned may perhaps break some existing program (due to extra memory consumption and/or to lack of "implicit copy"/"snapshot" behavior) so I've made the 'caching' change optional and by default off. However it's now at least possible to obtain nonsurprising behavior: s = shelve.open('she.dat','c',smart=1) s['ciao'] = range(3) s['ciao'].append(4) # no surprises any more I suspect the 'smart=1' should be made the default, but, if we at least put it in now, then perhaps we can migrate to having it as the default very slowly and gradually. Alex ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 07:15 Message: Logged In: YES user_id=21627 dannu's code is currently unavailable... I see no reason to add yet another layer of indirection, and no other application of such a wrapper within Python. The trickiest aspect of this educational: If the default behaviour does not change (as it shouldn't), how can unsuspecting users avoid running into the trap? So this is much more a documentation problem than a code problem. ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2003-03-30 23:59 Message: Logged In: YES user_id=60314 sure, but along what lines -- my previous patch's, or dannu's? let me know, and I'll get to work on it as soon as I'm back from Python-UK & short following trip (i..e around Apr 12) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-03-30 23:56 Message: Logged In: YES user_id=80475 The issue has arisen a couple of times of comp.lang.python. I think this patch would be helpful. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:43 Message: Logged In: YES user_id=21627 Would you then be willing to provide a complete patch (documentation, NEWS entry, test case)? ---------------------------------------------------------------------- Comment By: Alex Martelli (aleax) Date: 2003-03-30 23:26 Message: Logged In: YES user_id=60314 Yes, Martin, I'm still quite convinced shelve's behavior is generally surprising and often problematic, and even though the fixed suggested by both me and dannu are each imperfect (given the impossibility to find out, in general, whether an object has been modified), I think one or the other would still be better than the current situation. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 23:18 Message: Logged In: YES user_id=21627 Alex, do you still think this should be implemented, in some form or other? ---------------------------------------------------------------------- Comment By: Holger P. Krekel (dannu) Date: 2002-05-10 02:47 Message: Logged In: YES user_id=83092 I'd suggest not changing shelve at all but providing a "cache-commit" dictionary (ccdict) which can wrap a shelf-instance (or any other simple dictish instance) and provides the 'non-surprising' behaviour. Some proof of concept code for the following properties is provided here http://home.trillke.net/~hpk/ccdict.py Current properties are: - ccdict wraps a dictionary-like object which in turn only needs to provide __getitem__, __setitem__, __delitem__,has_key - on first access of an element ccdict makes a lookup on the underlying dict and caches the item. - the next accesses work with the cached thing. Unsurprising dict-semantics are provided. - deleting an item is deferred and actually happens on commit() time. deleting an item and later on assigning to it works as expected (i.e. the assignment takes preference). - commit() transfers the items in the cache to the underlying dict and clears the cache.Prior to issuing commit no writeback to the underlying dict happens. - deleting an ccdict-instance does *not* commit any changes. You have to explicitely call commit(). If you want to work readonly, don't call commit. - clear() only cleares the cache and not the underlying dict - you can explicitely prune the cache (via cache.keys() etc.) before calling commit(). This lets you avoid writing back unmodified objects if this is an issue. It seems quite impossible to figure out automagically which objects have been modified and so the solution is to do it explicitely (or don't commit for readonly). holger ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 22:55 Message: Logged In: YES user_id=80475 A few more thoughts: Please change the "except:" lines to specify the exception being caught. Also, if GvR shows interest in the patch, we should update the library reference and add unittests. The docstring should also mention that the cache is kept in memory -- besides persistence, one of the forces for shelving is memory conservation. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-09 20:43 Message: Logged In: YES user_id=80475 Nicely done! The code is clean and runs in the smart mode without problems on my existing programs. I agree that the patch solves a real world problem. The solution is clean, but a little expensive. If there were a way to be able to tell if an entry had been altered, it would save the 100% writeback. Unfortunately, I can't think of a way. The docstring could read more smoothly and plainly. Also, it should be clear that the cost of setting smart=1 is that 100% of the entries get rewritten on close. Two microscopically minor thoughts on the coding (feel free to disregard). Can some of the try/except blocks be replaced by something akin to 'if self.smart:'? For the writeback loop, consider 'for k,v in cache.iteritems()' as it takes less memory and saves a lookup. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-07 18:38 Message: Logged In: YES user_id=21627 Even more important than the backwards compatibility might be the issue that it writes back all accessed objects on close, which might be expensive if there have been many read-only accesses. So I think the option name could be also 'slow'; although 'writeback' might be more technical. Also, I wonder whether write-back should be attempted if the shelve was opened read-only. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470 From noreply@sourceforge.net Mon Mar 31 05:22:15 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 21:22:15 -0800 Subject: [Patches] [ python-Patches-712367 ] get build working on AIX Message-ID: Patches item #712367, was opened at 2003-03-31 00:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712367&group_id=5470 Category: Build Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: get build working on AIX Initial Comment: Tested on AIX 4.3 and 5.1. I may have tested this on 4.2 a long time ago. Changes to configure and setup.py. The setup.py changes are build curses. The configure changes create the export file differently. I was told by Gary Hooks at IBM that the export file must have a period for AIX 4.2 and beyond for dynamically imported modules to work properly (call back into the interpreter). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 07:22 Message: Logged In: YES user_id=21627 The patch itself is fine. However, we should also formally establish a minimum supported AIX version, in PEP 11 (perhaps with a vision of warning users in 2.4, and actively removing code that belongs to older versions in 2.5). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712367&group_id=5470 From noreply@sourceforge.net Mon Mar 31 05:25:02 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 21:25:02 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 17:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) >Assigned to: Martin v. Löwis (loewis) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 07:25 Message: Logged In: YES user_id=21627 There are plans to provide Python 2.2.3. I see no problem applying it to 2.2.2, as there shouldn't be any change in visible behaviour. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2003-03-31 05:59 Message: Logged In: YES user_id=205865 Also, this happens in 2.2.2 as well (the patch in Red Hat's bugzilla is against 2.2.2 actually). Is there a plan to release a 2.2.3? Is there value in backporting the patch? (should apply cleanly on 2.2.2). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-31 00:19 Message: Logged In: YES user_id=33168 _POSIX_SEMAPHORES aren't used if HAVE_BROKEN_POSIX_SEMAPHORES is defined. This currently occurs on Solaris 8 (at least). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-31 00:11 Message: Logged In: YES user_id=31435 Looks fine to me too. Since Python switched to using semaphores on Linux for 2.3, it's unclear that there's a system that uses the condvar code anymore. How will this get tested? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Mon Mar 31 05:33:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 21:33:48 -0800 Subject: [Patches] [ python-Patches-662464 ] 659188: no docs for HTMLParser Message-ID: Patches item #662464, was opened at 2003-01-05 05:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 Category: Documentation Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christopher Blunck (blunck2) >Assigned to: Martin v. Löwis (loewis) Summary: 659188: no docs for HTMLParser Initial Comment: Added some high level docs to explain how to use the class. Provided docstrings for the handle_* callback methods. ---------------------------------------------------------------------- Comment By: Christopher Blunck (blunck2) Date: 2003-03-31 05:24 Message: Logged In: YES user_id=531881 added documentation for handle_pi callback method in libhtmlparser.tex ---------------------------------------------------------------------- Comment By: Christopher Blunck (blunck2) Date: 2003-03-30 20:04 Message: Logged In: YES user_id=531881 Sure. I'll patch and post it later on today. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 19:25 Message: Logged In: YES user_id=21627 Christopher, can you please indicate whether you are going to provide a patch for the primary source of the documentation, i.e. the TeX files? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-01-15 13:20 Message: Logged In: YES user_id=21627 Can you please provide a patch for the Tex documentation (Doc/lib/libhtmlparser.tex) as well? I think this is where the submitter of bug 659188 was looking. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=662464&group_id=5470 From noreply@sourceforge.net Mon Mar 31 06:56:05 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Sun, 30 Mar 2003 22:56:05 -0800 Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass Message-ID: Patches item #536883, was opened at 2002-03-29 11:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Martin v. Löwis (loewis) Summary: SimpleXMLRPCServer auto-docing subclass Initial Comment: This SimpleXMLRPCServer subclass automatically serves HTML documentation, generated using pydoc, in response to an HTTP GET request (XML-RPC always uses POST). Here are some examples: http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2003-03-30 22:56 Message: Logged In: YES user_id=108973 >I'm not sure how to place this. Is this an extension to >pydoc? No. This module provides subclasses for SimpleXMLRPCServer and CGIXMLRPCServer. These subclasses serve pydoc-style documentation when you point your browser at them - see the examples in the patch summary. > Should it go into Tools, or into Lib, or into some > existing module? The attached file should go into Lib. > If this goes into Lib somewhere, it lacks documentation. Fair enough. Conditional on me writing documentation, is this contribution acceptable as is? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 06:59 Message: Logged In: YES user_id=21627 I'm not sure how to place this. Is this an extension to pydoc? Should it go into Tools, or into Lib, or into some existing module? If this goes into Lib somewhere, it lacks documentation. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-02-10 12:25 Message: Logged In: YES user_id=108973 Patch 473586 has been accepted so this patch can be accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 11:26 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 09:55 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 09:31 Message: Logged In: YES user_id=6380 Looks cute to me. Fredrik, any problem if I just check this in? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 From noreply@sourceforge.net Mon Mar 31 09:07:03 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 01:07:03 -0800 Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass Message-ID: Patches item #536883, was opened at 2002-03-29 20:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Martin v. Löwis (loewis) Summary: SimpleXMLRPCServer auto-docing subclass Initial Comment: This SimpleXMLRPCServer subclass automatically serves HTML documentation, generated using pydoc, in response to an HTTP GET request (XML-RPC always uses POST). Here are some examples: http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 11:07 Message: Logged In: YES user_id=21627 I see. The code is fine, but it needs to come with a test function, to operate the module as a program. I suggest that the test server provides the get_source_code() operation just as your demo client does; the docstring of the class may provide an xmlrpclib fragment that retrieves the source code (AFAICT, the source code is not directly accessible through an URL, is it?) I also recommend that you reconsider renaming the classes: If the module is named, say, DocXMLRPCServer, there is no need to have the Doc prefix on the class names. Instead, they can be named just "XMLRPCServer" etc. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-03-31 08:56 Message: Logged In: YES user_id=108973 >I'm not sure how to place this. Is this an extension to >pydoc? No. This module provides subclasses for SimpleXMLRPCServer and CGIXMLRPCServer. These subclasses serve pydoc-style documentation when you point your browser at them - see the examples in the patch summary. > Should it go into Tools, or into Lib, or into some > existing module? The attached file should go into Lib. > If this goes into Lib somewhere, it lacks documentation. Fair enough. Conditional on me writing documentation, is this contribution acceptable as is? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 16:59 Message: Logged In: YES user_id=21627 I'm not sure how to place this. Is this an extension to pydoc? Should it go into Tools, or into Lib, or into some existing module? If this goes into Lib somewhere, it lacks documentation. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-02-10 21:25 Message: Logged In: YES user_id=108973 Patch 473586 has been accepted so this patch can be accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 21:26 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 19:55 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 19:31 Message: Logged In: YES user_id=6380 Looks cute to me. Fredrik, any problem if I just check this in? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 From noreply@sourceforge.net Mon Mar 31 09:22:30 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 01:22:30 -0800 Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass Message-ID: Patches item #536883, was opened at 2002-03-29 11:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Martin v. Löwis (loewis) Summary: SimpleXMLRPCServer auto-docing subclass Initial Comment: This SimpleXMLRPCServer subclass automatically serves HTML documentation, generated using pydoc, in response to an HTTP GET request (XML-RPC always uses POST). Here are some examples: http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2003-03-31 01:22 Message: Logged In: YES user_id=108973 Write test function: ok Write documentation: ok >If the module is named, say, DocXMLRPCServer, there is >no need to have the Doc prefix on the class names. Hmmm. If you look at the core BaseHTTPRequestHandler derived classes, each one is prefixed to match the module that it is found in. The only two modules that I can think of with identical class names are cStringIO and StringIO, which theoretically provide identical semantics. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 01:07 Message: Logged In: YES user_id=21627 I see. The code is fine, but it needs to come with a test function, to operate the module as a program. I suggest that the test server provides the get_source_code() operation just as your demo client does; the docstring of the class may provide an xmlrpclib fragment that retrieves the source code (AFAICT, the source code is not directly accessible through an URL, is it?) I also recommend that you reconsider renaming the classes: If the module is named, say, DocXMLRPCServer, there is no need to have the Doc prefix on the class names. Instead, they can be named just "XMLRPCServer" etc. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-03-30 22:56 Message: Logged In: YES user_id=108973 >I'm not sure how to place this. Is this an extension to >pydoc? No. This module provides subclasses for SimpleXMLRPCServer and CGIXMLRPCServer. These subclasses serve pydoc-style documentation when you point your browser at them - see the examples in the patch summary. > Should it go into Tools, or into Lib, or into some > existing module? The attached file should go into Lib. > If this goes into Lib somewhere, it lacks documentation. Fair enough. Conditional on me writing documentation, is this contribution acceptable as is? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 06:59 Message: Logged In: YES user_id=21627 I'm not sure how to place this. Is this an extension to pydoc? Should it go into Tools, or into Lib, or into some existing module? If this goes into Lib somewhere, it lacks documentation. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-02-10 12:25 Message: Logged In: YES user_id=108973 Patch 473586 has been accepted so this patch can be accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 11:26 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 09:55 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 09:31 Message: Logged In: YES user_id=6380 Looks cute to me. Fredrik, any problem if I just check this in? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 From noreply@sourceforge.net Mon Mar 31 16:10:39 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 08:10:39 -0800 Subject: [Patches] [ python-Patches-712367 ] get build working on AIX Message-ID: Patches item #712367, was opened at 2003-03-30 17:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712367&group_id=5470 Category: Build Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Neal Norwitz (nnorwitz) Summary: get build working on AIX Initial Comment: Tested on AIX 4.3 and 5.1. I may have tested this on 4.2 a long time ago. Changes to configure and setup.py. The setup.py changes are build curses. The configure changes create the export file differently. I was told by Gary Hooks at IBM that the export file must have a period for AIX 4.2 and beyond for dynamically imported modules to work properly (call back into the interpreter). ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-31 11:10 Message: Logged In: YES user_id=33168 Ok, I'll ask about what the best minimum version should be. Right now, I suspect AIX 4.2 which is the oldest version I have access to in the snake-farm. Checked in as: setup.py 1.158 configure 1.389 configure.in 1.400 ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 00:22 Message: Logged In: YES user_id=21627 The patch itself is fine. However, we should also formally establish a minimum supported AIX version, in PEP 11 (perhaps with a vision of warning users in 2.4, and actively removing code that belongs to older versions in 2.5). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712367&group_id=5470 From noreply@sourceforge.net Mon Mar 31 16:58:48 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 08:58:48 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 17:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Jason Tishler (jlt63) Date: 2003-03-31 07:58 Message: Logged In: YES user_id=86216 loewis> I'm in favour of applying this patch, and loewis> also of patches that mandate recent Cygwin loewis> releases; I would like to apply an enhanced version of this patch. By enhanced, I mean using "gcc -shared" (no more dllwrap and gcc -mdll) and removing redundant gcc options, etc. Additionally, I would like to fix get_versions() so it can deal with versions that only have two components (e.g., 3.2) as opposed to requiring three (e.g. 2.95.3). Are these changes acceptable? loewis> if such patches are implemented, the minimum loewis> required Cygwin version should be stated loewis> somewhere. I propose that the currently available Cygwin and Mingw tool chains be that above stated minimum. Is this acceptable? Unfortunately, I have no idea where the above stated "somewhere" shoud be. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 14:31 Message: Logged In: YES user_id=87160 I can help with testing. I have access to W2K and Win98 (ugh) boxen. I don't mind installing a few older toolchains if you think that's necessary. I think any C/C++ python extension using plain distutils (no fancy hacks added on) and has one or more DLL dependencies is a good test case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 13:15 Message: Logged In: YES user_id=21627 I'm in favour of applying this patch, and also of patches that mandate recent Cygwin releases; if such patches are implemented, the minimum required Cygwin version should be stated somewhere. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 12:16 Message: Logged In: YES user_id=86216 John, would you be willing to help test or supply me with test cases? I have built exactly one Win32 extension. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 11:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 09:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 15:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Mon Mar 31 17:02:23 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 09:02:23 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-24 17:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Jason Tishler (jlt63) Date: 2003-03-31 08:02 Message: Logged In: YES user_id=86216 jkluebs> I can help with testing. I have access to W2K jkluebs> and Win98 (ugh) boxen. I don't mind jkluebs> installing a few older toolchains if you jkluebs> think that's necessary. Thanks for the offer. I'm set up for the current Cygwin and Mingw tool chains. Let's wait to see if testing with older ones is necessary. jkluebs> I think any C/C++ python extension using jkluebs> plain distutils (no fancy hacks added on) and jkluebs> has one or more DLL dependencies is a good jkluebs> test case. Can you point me to one that builds OOTB under Python 2.2.2? ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-31 07:58 Message: Logged In: YES user_id=86216 loewis> I'm in favour of applying this patch, and loewis> also of patches that mandate recent Cygwin loewis> releases; I would like to apply an enhanced version of this patch. By enhanced, I mean using "gcc -shared" (no more dllwrap and gcc -mdll) and removing redundant gcc options, etc. Additionally, I would like to fix get_versions() so it can deal with versions that only have two components (e.g., 3.2) as opposed to requiring three (e.g. 2.95.3). Are these changes acceptable? loewis> if such patches are implemented, the minimum loewis> required Cygwin version should be stated loewis> somewhere. I propose that the currently available Cygwin and Mingw tool chains be that above stated minimum. Is this acceptable? Unfortunately, I have no idea where the above stated "somewhere" shoud be. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 14:31 Message: Logged In: YES user_id=87160 I can help with testing. I have access to W2K and Win98 (ugh) boxen. I don't mind installing a few older toolchains if you think that's necessary. I think any C/C++ python extension using plain distutils (no fancy hacks added on) and has one or more DLL dependencies is a good test case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 13:15 Message: Logged In: YES user_id=21627 I'm in favour of applying this patch, and also of patches that mandate recent Cygwin releases; if such patches are implemented, the minimum required Cygwin version should be stated somewhere. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 12:16 Message: Logged In: YES user_id=86216 John, would you be willing to help test or supply me with test cases? I have built exactly one Win32 extension. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 11:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 09:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-27 15:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Mon Mar 31 17:23:29 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 09:23:29 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 11:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Martin v. Löwis (loewis) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-03-31 12:23 Message: Logged In: YES user_id=31435 I marked this as Accepted, and also don't see any problem with backporting to the 2.2 line. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 00:25 Message: Logged In: YES user_id=21627 There are plans to provide Python 2.2.3. I see no problem applying it to 2.2.2, as there shouldn't be any change in visible behaviour. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2003-03-30 22:59 Message: Logged In: YES user_id=205865 Also, this happens in 2.2.2 as well (the patch in Red Hat's bugzilla is against 2.2.2 actually). Is there a plan to release a 2.2.3? Is there value in backporting the patch? (should apply cleanly on 2.2.2). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 17:19 Message: Logged In: YES user_id=33168 _POSIX_SEMAPHORES aren't used if HAVE_BROKEN_POSIX_SEMAPHORES is defined. This currently occurs on Solaris 8 (at least). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-30 17:11 Message: Logged In: YES user_id=31435 Looks fine to me too. Since Python switched to using semaphores on Linux for 2.3, it's unclear that there's a system that uses the condvar code anymore. How will this get tested? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 11:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Mon Mar 31 18:21:04 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 10:21:04 -0800 Subject: [Patches] [ python-Patches-711835 ] Removing unnecessary lock operations Message-ID: Patches item #711835, was opened at 2003-03-29 11:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: Accepted Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Martin v. Löwis (loewis) Summary: Removing unnecessary lock operations Initial Comment: PyThread_acquire_lock can be further optimized to do less locking on the global lock mutex. Original patch location: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86281 ---------------------------------------------------------------------- >Comment By: Mihai Ibanescu (misa) Date: 2003-03-31 13:21 Message: Logged In: YES user_id=205865 One of the glibc developers expressed some concern on the 2.3 implementation of the global lock using semaphores. I'd be glad to funnel any communication with the glibc community. (you) should do some timings on the current lock implementation vs the one using semaphores. POSIX semaphores have special requirements (e.g., sem_post must be callable in signal handlers) which make semaphores pretty expensive. In NPTL, for instance, sem_post always makes a syscall, there is no userlevel-only path. This makes using semaphores pretty expensive. The same restricting applies in one form or another to all POSIX compliant semaphore implementations. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-31 12:23 Message: Logged In: YES user_id=31435 I marked this as Accepted, and also don't see any problem with backporting to the 2.2 line. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 00:25 Message: Logged In: YES user_id=21627 There are plans to provide Python 2.2.3. I see no problem applying it to 2.2.2, as there shouldn't be any change in visible behaviour. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2003-03-30 22:59 Message: Logged In: YES user_id=205865 Also, this happens in 2.2.2 as well (the patch in Red Hat's bugzilla is against 2.2.2 actually). Is there a plan to release a 2.2.3? Is there value in backporting the patch? (should apply cleanly on 2.2.2). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-03-30 17:19 Message: Logged In: YES user_id=33168 _POSIX_SEMAPHORES aren't used if HAVE_BROKEN_POSIX_SEMAPHORES is defined. This currently occurs on Solaris 8 (at least). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-03-30 17:11 Message: Logged In: YES user_id=31435 Looks fine to me too. Since Python switched to using semaphores on Linux for 2.3, it's unclear that there's a system that uses the condvar code anymore. How will this get tested? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 11:49 Message: Logged In: YES user_id=21627 This looks reasonable to me, but I may be missing something. Tim, can you see a problem with that code? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=711835&group_id=5470 From noreply@sourceforge.net Mon Mar 31 18:36:46 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 10:36:46 -0800 Subject: [Patches] [ python-Patches-710127 ] Make "%c" % u"a" work Message-ID: Patches item #710127, was opened at 2003-03-26 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: Walter Dörwald (doerwalter) >Assigned to: Martin v. Löwis (loewis) >Summary: Make "%c" % u"a" work Initial Comment: Currently "%c" % u"a" fails, while "%s" % u"a" works. This patch fixes this problem. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-31 20:36 Message: Logged In: YES user_id=89016 Checked in as: Misc/NEWS 1.708 Objects/stringobject.c 2.206 Lib/test/test_unicode.py 1.80 Lib/test/test_str.py 1.2 Lib/test/string_tests.py 1.30 BTW "%c" % 256 still fails. Should this be fixed too? "%c" % 256 raises an OverflowError now, u"%c" % sys.maxunicode+1 raises a ValueError. At least they should be changed to raise the same exception. If we fix "%c" % 256 what about chr()? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 15:18 Message: Logged In: YES user_id=21627 Looks fine, please apply it. Also add a test case that fails now but passes with the change, and add a NEWS entry. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 From noreply@sourceforge.net Mon Mar 31 19:38:38 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 11:38:38 -0800 Subject: [Patches] [ python-Patches-701743 ] Reloading pseudo modules Message-ID: Patches item #701743, was opened at 2003-03-11 19:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Reloading pseudo modules Initial Comment: Python allows to put something that is not a module in sys.modules. Unfortunately reload() does not work wth such a pseudo module ("TypeError: reload() argument must be module" is raised). This patch changes Python/import.c::PyImport_ReloadModule() so that it works with anything that has a __name__ attribute that can be found in sys.modules.keys(). ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2003-03-31 21:38 Message: Logged In: YES user_id=89016 A use case can be found at http://www.livinglogic.de/viewcvs/index.cgi/LivingLogic/xist/_xist/xsc.py?rev=2.235 (Look for the classmethod makemod() in the class Namespace). This puts a class object into sys.modules instead of the module that defines this class. This makes it possible to derive from "modules". Of course the patch does not fully fix the problem, because reload() does not repopulate the class object. Unfortunately that's impossible to fix with Python code, as it's impossible for Python code to distinguish the first import from subsequent ones. If this was possible (and Python code had access to the old "module"), a real reload could be coded in pure Python for this specific case. But with the patch at least it's possible to use the return value of reload() afterwards to use the new "module". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 18:34 Message: Logged In: YES user_id=21627 The patch looks fine now as far as it goes. I'm unsure what the use case is, though: What object do you have in sys.modules for which reload() would be meaningful? Can you attach an example where reloading fails now but succeeds with your patch applied? As for reload modifying the module object: It needs to, or else all clients would have to run reload; this would include things like function default arguments. I guess it returns a result for historical reasons. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-17 15:25 Message: Logged In: YES user_id=89016 PyImport_ReloadModule() is only called by the implementation of the reload builtin, so it seems that m==NULL can only happen with broken extension modules. I've updated the patch accordingly (raising a SystemError) and changed the error case for a missing __name__ attribute to raise a TypeError when an AttributeError is detected. Unfortunately this might mask exceptions (e.g. when __name__ is implemented as a property.) Another problem is that reload() seems to repopulate the existing module object when reloading real modules. Example: Write a simple foo.py which contains "x = 1" and then: >>> import foo >>> foo.x 1 [ Now open your editor and change foo.py to "x = 2" ] >>> foo2 = reload(foo) >>> foo.x 2 >>> foo2.x 2 >>> print id(foo), id(foo2) 1077466884 1077466884 >>> Of course this can't work with pseudo modules. I wonder why reload() has a return value at all, as it always modifies its parameter for real modules. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-15 14:51 Message: Logged In: YES user_id=21627 I think the exceptions need to be reworked: "must be a module" now only occurs if m is NULL. Under what circumstances could that happen? Failure to provide __name__ is passed through; shouldn't this get diagnosed in a better way? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=701743&group_id=5470 From noreply@sourceforge.net Mon Mar 31 19:54:33 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 11:54:33 -0800 Subject: [Patches] [ python-Patches-712900 ] sre fixes for lastindex and minimizing repeats+assertions Message-ID: Patches item #712900, was opened at 2003-03-31 10:54 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712900&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: sre fixes for lastindex and minimizing repeats+assertions Initial Comment: The attached patch fixes two bugs in _sre.c; it also does a bit of reorganization. First the bugs. 672491 points out that lastindex is calculated differently in 2.3 than in previous versions. This patch restores the previous behavior. Since lastindex cannot be restored (when backtracking) from lastmark alone, it is now saved and restored independently (by the LASTMARK_SAVE and RESTORE macros). The second bug appears when minimizing repeats are combined with assertions: >>> re.match('([ab]*?)(?=(b)?)c', 'abc').groups() ('ab', 'b') The second group should be None, since the 'b' is consumed by the first group. To fix this, it is necessary to save lastmark before attempting to match the tail in OP_MIN_UNTIL and to restore it if the tail fails to match. The reorganization has to do with the handling of the SRE_STATE's lastmark and mark array. The mark array tracks the start and end of capturing groups; lastmark is the highest index in the array so far encountered. Previously, whenever lastmark was restored back to a lower value (in 2.3a2 this is done in the lastmark_restore function), the tail of the mark array was NULLed out (using memset). This patch adopts the rule that all indexes greater than lastmark are invalid, so restoring lastmark does not also require clearing the tail. To ensure that indexes <= lastmark have valid pointers, OP_MARK checks if lastmark is being increased by more than one; if so, it NULLs out the intervening pointers. This rule also required changes to the GROUPREF opcodes and the state_getslice function to ensure that they do not access indexes greater than lastmark. For consistency, lastmark is now initialized to –1, to indicate that no entries in the mark array are valid. Needless to say, the reorganization is not necessary to fix the bugs; it may be a bad idea. It seems to be marginally faster than a version that fixes the bugs but is similar to the current code (including a memset inside the LASTMARK_RESTORE macro). One other thing. I have removed a test for string == Py_None from state_getslice, since I can’t find any way for string to be Py_None at that point (string is always the object providing the text to be searched; if it were Py_None, an exception should be raised by the getstring function called by state_init). Perhaps I missed something? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=712900&group_id=5470 From noreply@sourceforge.net Mon Mar 31 21:30:47 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 13:30:47 -0800 Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass Message-ID: Patches item #536883, was opened at 2002-03-29 20:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Martin v. Löwis (loewis) Summary: SimpleXMLRPCServer auto-docing subclass Initial Comment: This SimpleXMLRPCServer subclass automatically serves HTML documentation, generated using pydoc, in response to an HTTP GET request (XML-RPC always uses POST). Here are some examples: http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 23:30 Message: Logged In: YES user_id=21627 Ok, leave the naming as-is, unless other reviewers comment in one direction or the other. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-03-31 11:22 Message: Logged In: YES user_id=108973 Write test function: ok Write documentation: ok >If the module is named, say, DocXMLRPCServer, there is >no need to have the Doc prefix on the class names. Hmmm. If you look at the core BaseHTTPRequestHandler derived classes, each one is prefixed to match the module that it is found in. The only two modules that I can think of with identical class names are cStringIO and StringIO, which theoretically provide identical semantics. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 11:07 Message: Logged In: YES user_id=21627 I see. The code is fine, but it needs to come with a test function, to operate the module as a program. I suggest that the test server provides the get_source_code() operation just as your demo client does; the docstring of the class may provide an xmlrpclib fragment that retrieves the source code (AFAICT, the source code is not directly accessible through an URL, is it?) I also recommend that you reconsider renaming the classes: If the module is named, say, DocXMLRPCServer, there is no need to have the Doc prefix on the class names. Instead, they can be named just "XMLRPCServer" etc. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-03-31 08:56 Message: Logged In: YES user_id=108973 >I'm not sure how to place this. Is this an extension to >pydoc? No. This module provides subclasses for SimpleXMLRPCServer and CGIXMLRPCServer. These subclasses serve pydoc-style documentation when you point your browser at them - see the examples in the patch summary. > Should it go into Tools, or into Lib, or into some > existing module? The attached file should go into Lib. > If this goes into Lib somewhere, it lacks documentation. Fair enough. Conditional on me writing documentation, is this contribution acceptable as is? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-30 16:59 Message: Logged In: YES user_id=21627 I'm not sure how to place this. Is this an extension to pydoc? Should it go into Tools, or into Lib, or into some existing module? If this goes into Lib somewhere, it lacks documentation. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2003-02-10 21:25 Message: Logged In: YES user_id=108973 Patch 473586 has been accepted so this patch can be accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 21:26 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-04-04 19:55 Message: Logged In: YES user_id=108973 Sorry, I was sloppy about the description: This patch is dependant on patch 473586: [473586] SimpleXMLRPCServer - fixes and CGI So please don't check this in until that patch is accepted. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 19:31 Message: Logged In: YES user_id=6380 Looks cute to me. Fredrik, any problem if I just check this in? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470 From noreply@sourceforge.net Mon Mar 31 21:33:31 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 13:33:31 -0800 Subject: [Patches] [ python-Patches-709178 ] remove -static option from cygwinccompiler Message-ID: Patches item #709178, was opened at 2003-03-25 03:55 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Kabir Luebs (jkluebs) Assigned to: Jason Tishler (jlt63) Summary: remove -static option from cygwinccompiler Initial Comment: Currently, the cygwinccompiler.py compiler handling in distutils is invoking the cygwin and mingw compilers with the -static option. Logically, this means that the linker should choose to link to static libraries instead of shared/dynamically linked libraries. Current win32 binutils expect import libraries to have a .dll.a suffix and static libraries to have .a suffix. If -static is passed, it will skip the .dll.a libraries. This is pain if one has a tree with both static and dynamic libraries using this naming convention, and wish to use the dynamic libraries. The -static option being passed in distutils is to get around a bug in old versions of binutils where it would get confused when it found the DLLs themselves. The decision to use static or shared libraries is site or package specific, and should be left to the setup script or to command line options. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-03-31 23:33 Message: Logged In: YES user_id=21627 jlt63: Your proposed changes all sound fine. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-31 19:02 Message: Logged In: YES user_id=86216 jkluebs> I can help with testing. I have access to W2K jkluebs> and Win98 (ugh) boxen. I don't mind jkluebs> installing a few older toolchains if you jkluebs> think that's necessary. Thanks for the offer. I'm set up for the current Cygwin and Mingw tool chains. Let's wait to see if testing with older ones is necessary. jkluebs> I think any C/C++ python extension using jkluebs> plain distutils (no fancy hacks added on) and jkluebs> has one or more DLL dependencies is a good jkluebs> test case. Can you point me to one that builds OOTB under Python 2.2.2? ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-31 18:58 Message: Logged In: YES user_id=86216 loewis> I'm in favour of applying this patch, and loewis> also of patches that mandate recent Cygwin loewis> releases; I would like to apply an enhanced version of this patch. By enhanced, I mean using "gcc -shared" (no more dllwrap and gcc -mdll) and removing redundant gcc options, etc. Additionally, I would like to fix get_versions() so it can deal with versions that only have two components (e.g., 3.2) as opposed to requiring three (e.g. 2.95.3). Are these changes acceptable? loewis> if such patches are implemented, the minimum loewis> required Cygwin version should be stated loewis> somewhere. I propose that the currently available Cygwin and Mingw tool chains be that above stated minimum. Is this acceptable? Unfortunately, I have no idea where the above stated "somewhere" shoud be. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-29 00:31 Message: Logged In: YES user_id=87160 I can help with testing. I have access to W2K and Win98 (ugh) boxen. I don't mind installing a few older toolchains if you think that's necessary. I think any C/C++ python extension using plain distutils (no fancy hacks added on) and has one or more DLL dependencies is a good test case. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 23:15 Message: Logged In: YES user_id=21627 I'm in favour of applying this patch, and also of patches that mandate recent Cygwin releases; if such patches are implemented, the minimum required Cygwin version should be stated somewhere. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 22:16 Message: Logged In: YES user_id=86216 John, would you be willing to help test or supply me with test cases? I have built exactly one Win32 extension. ---------------------------------------------------------------------- Comment By: John Kabir Luebs (jkluebs) Date: 2003-03-28 21:56 Message: Logged In: YES user_id=87160 The -mdll --entry DllMain@12 option is guarded for an old version of gcc that did not have the correct specs to accept -shared. I didn't touch it, even though it's crazy if anyone is using such an old and buggy toolchain. --shared and --dll are equivalent as far as ld is concerned. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2003-03-28 19:41 Message: Logged In: YES user_id=86216 Note that I only have minimal experience building Win32 extensions modules... This patch works "fine" with my *very* limited testing. Specifically, I successfully rebuilt the Win32 readline module with it applied. BTW, this area of Distutils probably should be revisited to bring it up to date. For example, the "-mdll --entry _DllMain@12" options could be replaced by "-shared". ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-28 01:03 Message: Logged In: YES user_id=21627 Jason, can you take a look? If not, please unassign it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=709178&group_id=5470 From noreply@sourceforge.net Mon Mar 31 22:16:52 2003 From: noreply@sourceforge.net (SourceForge.net) Date: Mon, 31 Mar 2003 14:16:52 -0800 Subject: [Patches] [ python-Patches-710127 ] Make "%c" % u"a" work Message-ID: Patches item #710127, was opened at 2003-03-26 17:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Martin v. Löwis (loewis) >Summary: Make "%c" % u"a" work Initial Comment: Currently "%c" % u"a" fails, while "%s" % u"a" works. This patch fixes this problem. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2003-04-01 00:16 Message: Logged In: YES user_id=21627 I can't see why "%c" % 256 should pass; interpreting the 256 as a Unicode ordinal is stretching things too much (if 256 was a Unicode ordinal, then 255 should be a Unicode ordinal too, and you would have to take into account the system encoding). I would think it would be consistent if both gave OverflowError (Result too large to be represented); this deserves another NEWS entry. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2003-03-31 20:36 Message: Logged In: YES user_id=89016 Checked in as: Misc/NEWS 1.708 Objects/stringobject.c 2.206 Lib/test/test_unicode.py 1.80 Lib/test/test_str.py 1.2 Lib/test/string_tests.py 1.30 BTW "%c" % 256 still fails. Should this be fixed too? "%c" % 256 raises an OverflowError now, u"%c" % sys.maxunicode+1 raises a ValueError. At least they should be changed to raise the same exception. If we fix "%c" % 256 what about chr()? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2003-03-29 15:18 Message: Logged In: YES user_id=21627 Looks fine, please apply it. Also add a test case that fails now but passes with the change, and add a NEWS entry. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=710127&group_id=5470