From noreply@sourceforge.net Mon Jul 1 06:15:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 30 Jun 2002 22:15:10 -0700 Subject: [Patches] [ python-Patches-575827 ] SSL release GIL Message-ID: Patches item #575827, was opened at 2002-07-01 07:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gerhard Häring (ghaering) Assigned to: Nobody/Anonymous (nobody) Summary: SSL release GIL Initial Comment: This is more or less a rewrite of parts of patch #475045. It releases the GIL during the SSL operations for opening a SSL socket. Currently the GIL is only released during the read and write operations to a SSL socket. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470 From noreply@sourceforge.net Mon Jul 1 06:15:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 30 Jun 2002 22:15:51 -0700 Subject: [Patches] [ python-Patches-575827 ] SSL release GIL Message-ID: Patches item #575827, was opened at 2002-07-01 07:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gerhard Häring (ghaering) >Assigned to: Martin v. Löwis (loewis) Summary: SSL release GIL Initial Comment: This is more or less a rewrite of parts of patch #475045. It releases the GIL during the SSL operations for opening a SSL socket. Currently the GIL is only released during the read and write operations to a SSL socket. ---------------------------------------------------------------------- >Comment By: Gerhard Häring (ghaering) Date: 2002-07-01 07:15 Message: Logged In: YES user_id=163326 Randomly assigning to Martin, who proofread my previous patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470 From noreply@sourceforge.net Mon Jul 1 09:59:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 01:59:26 -0700 Subject: [Patches] [ python-Patches-574532 ] Update freeze to use zlib 1.1.4 Message-ID: Patches item #574532, was opened at 2002-06-27 11:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470 Category: Demos and tools Group: None Status: Open Resolution: None Priority: 5 Submitted By: Lawrence Hudson (lhudson) Assigned to: Nobody/Anonymous (nobody) Summary: Update freeze to use zlib 1.1.4 Initial Comment: freeze currently looks for zlib 1.1.3. ---------------------------------------------------------------------- >Comment By: Lawrence Hudson (lhudson) Date: 2002-07-01 08:59 Message: Logged In: YES user_id=82888 D'Oh! Sorry about that. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-28 01:14 Message: Logged In: YES user_id=14198 there is no patch attached here that I can see! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470 From noreply@sourceforge.net Mon Jul 1 19:03:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 11:03:21 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 07:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 20:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 05:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 07:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Mon Jul 1 19:08:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 11:08:23 -0700 Subject: [Patches] [ python-Patches-575224 ] dict(seqn, value) Message-ID: Patches item #575224, was opened at 2002-06-29 01:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575224&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Guido van Rossum (gvanrossum) Summary: dict(seqn, value) Initial Comment: Have the dict() constructor accept a pair of arguments, a sequence of keys and a constant value. Addresses the common task of initializing dictionary elements to a constant value. Useful for building fast membership tests and for quickly (C-speed) eliminating duplicates in a sequence. Is faster, more flexible, and clearer than: d = {} map(d.__setitem__, seqn, []) Examples: uniq = dict(seqn,True).keys() # eliminate duplicates termwords = dict('End Quit Stop Abort'.split(), True) if lexeme in termwords: sys.exit(0) absences = dict('Tom Dick Harry'.split(), 0) Patch includes source, docs, and unittest. Also includes a minor change to shlex.py showing how the builtin can cleanly update existing code to achieve an order of magnitude performance boost (classifying characters is the most common operation in shlex). Summary of discussion on py-dev: At Walter and Barry's suggestion, the value was allowed to take any value (I initially used None). At Tim's suggestion, I went to an explicit two argument form to avoid ambiguity. If we ever get sets, Timbot thinks that they ought to be the tool of choice for two of the above use cases. Jack Jansen likes the tool and wants to go further and warn of inefficient searching when 'in' is used with sequences giving O(n) search speed when the could have O(1). The F/bot and Steve Holden poked at me for proposing something (speed and clarity aside) that can already be handled using existing constructs and Dave Abrahams disagreed with them. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 20:08 Message: Logged In: YES user_id=38376 The "obvious" other way to use a 2-argument to dict() would be dict(d.keys(), d.values()). Not sure what's more common, though... (and for the record, I'd prefer a separate "set" type/constructor, even if it's basically just a dict without some of the methods) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-06-30 21:47 Message: Logged In: YES user_id=80475 I'm away from the computer for the next five weeks. Oren Tirosh will champion this patch from here forward. He can lead the discussion and made any requested modifications. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575224&group_id=5470 From noreply@sourceforge.net Mon Jul 1 19:09:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 11:09:46 -0700 Subject: [Patches] [ python-Patches-572936 ] (?(id/name)yes|no) re implementation Message-ID: Patches item #572936, was opened at 2002-06-24 03:41 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572936&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) >Assigned to: Fredrik Lundh (effbot) Summary: (?(id/name)yes|no) re implementation Initial Comment: This patch implements a regular expression feature, which allows some interesting patterns, in the same way as implemented in perl. For example, (?(1)yes|no) matches with "yes" if group "1" exists, and with "no", if it doesn't. Without this feature, the regular expression must be duplicated to get the same results. In addition to perl's feature, it will also accept a Python named group as argument. Here's an example: (<)?\w+@\w+(\.\w+)+(?(1)>) This is a poor email matching regular expression, which will match with or without the "<>" symbols. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572936&group_id=5470 From noreply@sourceforge.net Mon Jul 1 19:15:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 11:15:22 -0700 Subject: [Patches] [ python-Patches-569328 ] names in types module Message-ID: Patches item #569328, was opened at 2002-06-15 11:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569328&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: names in types module Initial Comment: Adds names to types module so types are accessible as 'type.spam' in addition to the existing longer version 'types.SpamType'. The short names match the type's __name__ attribute. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 20:15 Message: Logged In: YES user_id=38376 "from * import types" is a rather common pydiom, and I'm pretty sure most people using that expects to get a bunch of [A-Z]\w+Type names, and nothing else. -0 from me. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-06-18 16:40 Message: Logged In: YES user_id=562624 Updated patch. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-06-15 12:58 Message: Logged In: YES user_id=562624 http://mail.python.org/pipermail/python-dev/2002-June/025410.html ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-15 12:05 Message: Logged In: YES user_id=21627 What is the purpose of this change? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569328&group_id=5470 From noreply@sourceforge.net Mon Jul 1 20:23:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 12:23:00 -0700 Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning Message-ID: Patches item #576101, was opened at 2002-07-01 19:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: Alternative implementation of interning Initial Comment: An interned string has a flag set indicating that it is interned instead of a pointer to the interned string. This pointer was almost always either NULL or pointing to the same object. The other cases were rare and ineffective as an optimization. This saves an average of 3 bytes per string. Interned strings are no longer immortal. They are automatically destroyed when there are no more references to them except the global dictionary of interned strings. New function (actually a macro) PyString_CheckInterned to check whether a string is interned. There are no more references to ob_sinterned anywhere outside stringobject.c. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 From noreply@sourceforge.net Tue Jul 2 02:47:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 18:47:26 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 15:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-02 04:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 13:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 15:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Tue Jul 2 05:21:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 01 Jul 2002 21:21:22 -0700 Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning Message-ID: Patches item #576101, was opened at 2002-07-01 14:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: Alternative implementation of interning Initial Comment: An interned string has a flag set indicating that it is interned instead of a pointer to the interned string. This pointer was almost always either NULL or pointing to the same object. The other cases were rare and ineffective as an optimization. This saves an average of 3 bytes per string. Interned strings are no longer immortal. They are automatically destroyed when there are no more references to them except the global dictionary of interned strings. New function (actually a macro) PyString_CheckInterned to check whether a string is interned. There are no more references to ob_sinterned anywhere outside stringobject.c. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2002-07-01 23:21 Message: Logged In: YES user_id=80475 I like the way you consolidated all of the knowledge about interning into one place. Consider adding an example to the docs of an effective use of interning for optimization. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 From noreply@sourceforge.net Tue Jul 2 11:16:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 03:16:29 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 05:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-07-02 10:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-02 01:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 18:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 03:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 05:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Tue Jul 2 12:11:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 04:11:54 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Nobody/Anonymous (nobody) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 15:44:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 07:44:42 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 07:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Nobody/Anonymous (nobody) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-02 10:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 16:06:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 08:06:24 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Nobody/Anonymous (nobody) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:06 Message: Logged In: YES user_id=119770 Do you have access to a machine where sizeof (long) == 8? Here's what I'm getting: $ uname -a OSF1 duh V4.0 878 alpha $ python >>> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 16:42:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 08:42:57 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Nobody/Anonymous (nobody) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:42 Message: Logged In: YES user_id=119770 Bug #453208 indicates a similar problem. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:06 Message: Logged In: YES user_id=119770 Do you have access to a machine where sizeof (long) == 8? Here's what I'm getting: $ uname -a OSF1 duh V4.0 878 alpha $ python >>> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 16:47:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 08:47:28 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Nobody/Anonymous (nobody) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 17:02:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 09:02:18 -0700 Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr Message-ID: Patches item #576458, was opened at 2002-07-02 18:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Extend PyErr_SetFromWindowsErr Initial Comment: PyErr_SetFromWindowsErr and PyErr_SetFromWindowsErrWithFilename can only raise PyExc_WindowsError. This patch introduces variants of these functions taking an additional PyObject* parameter, which allows to specify the type of the exception to raise. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 From noreply@sourceforge.net Tue Jul 2 17:13:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 09:13:36 -0700 Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr Message-ID: Patches item #576458, was opened at 2002-07-02 18:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Extend PyErr_SetFromWindowsErr Initial Comment: PyErr_SetFromWindowsErr and PyErr_SetFromWindowsErrWithFilename can only raise PyExc_WindowsError. This patch introduces variants of these functions taking an additional PyObject* parameter, which allows to specify the type of the exception to raise. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2002-07-02 18:13 Message: Logged In: YES user_id=11105 Patch for the header file was missing... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 From noreply@sourceforge.net Tue Jul 2 21:20:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 13:20:51 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 07:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) >Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-02 16:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 10:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 22:01:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 14:01:11 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 12:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 22:41:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 14:41:41 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:41 Message: Logged In: YES user_id=119770 Ok, well, testing worked fine on the test file I created but running against Lib/test/test_zipfile.py gives: Traceback (most recent call last): File "test_zipfile.py", line 35, in ? zipTest(file, zipfile.ZIP_STORED, writtenData) File "test_zipfile.py", line 16, in zipTest readData2 = zip.read(srcname) File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py", line 351, in read raise BadZipfile, "Bad CRC-32 for file %s" % name zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 12:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 22:52:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 14:52:16 -0700 Subject: [Patches] [ python-Patches-553108 ] Deprecate bsddb Message-ID: Patches item #553108, was opened at 2002-05-07 05:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470 Category: Modules Group: Python 2.3 >Status: Open Resolution: Accepted Priority: 5 Submitted By: Garth T Kidd (gtk) Assigned to: Skip Montanaro (montanaro) Summary: Deprecate bsddb Initial Comment: Large numbers of inserts break bsddb, as first discovered in Python 1.5 (bug 408271). According to Barry Warsaw, "trying to get the bsddb module that comes with Python to work is a hopeless cause." If it's broken, let's discourage people from using it. In particular, let's ensure that people importing shelve or anydbm don't end up using it by default. The submitted patch adds a DeprecationWarning to the bsddb module and removes bsddb from the list of db module candidates in anydbm. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-07-02 23:52 Message: Logged In: YES user_id=45365 Skip, I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc). Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-14 22:32 Message: Logged In: YES user_id=44345 Implemented in setup.py 1.93 README 1.147 configure 1.315 configure.in 1.325 pyconfig.h.in 1.42 Modules/dbmmodule 2.30 ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-14 09:16 Message: Logged In: YES user_id=21627 The patch looks good, please apply it. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-14 05:33 Message: Logged In: YES user_id=44345 a couple more tweaks... I forgot to include dbmmodule.c in previous patches. This version of the patch also includes a modified README file that adds a section about building the bsddb and dbm modules. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-13 09:35 Message: Logged In: YES user_id=44345 Here's an updated patch. It's different in a couple ways: * support for Berkeley DB 4.x was added. You will need to configure iBerkdb with the 1.85 compatibility stuff. * I cleaned up the dbm build code a bit. * I added a diff for the configure file for people who don't have autoconf handy. Skip ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-11 18:09 Message: Logged In: YES user_id=44345 I think deprecating bsddb is too drastic. In the first place, the problems you refer to are in the underlying Berkeley DB library, not in the bsddb code itself. In the second place, later versions of the library fix the problem. The attached patch attempts to modify setup.py and configure.in to solve the problem. It does a couple things differently than the current CVS version: 1. It only searches for versions 2 and 3 of the Berkeley DB library by default. People who know what they are doing can uncomment the information relevant to version 1. 2. It moves all the checking code into setup.py. The header file checks in configure.in were deleted. 3. The ndbm lookalike stuff for the dbm module is done differently. This has not really been tested yet. I anticipate further changes will be necessary with this code. I'm sure it's not perfect. Please give it a try and let me know how it works for you. All that said, I think a better migration path is to replace the current module with the bsddb3/pybsddb stuff. I think that would effectively restrict you to versions 3 or 4 of the underlying Berkeley DB library, so it probably couldn't be done with impunity. Skip ---------------------------------------------------------------------- Comment By: Martin D Katz, Ph.D. (drbits) Date: 2002-05-20 20:14 Message: Logged In: YES user_id=276840 #!/bin/python # Test for Python bug report 553108 # This program shows that bsddb seems to work reliably with # the btopen database format. # This is based on the test program # in the discussion of bug report 445862 # This has been enhanced to perform read, modify, # write operations in random order. # This is only one of several tests I performed. # This included 4,000,000 read, modify, write operations to 90,909 records # (an average of 44,000 writes for each record). # Note: This program took approximately 50 hours to run # on my 930MHz Pentium 3 under Windows 2000 with # ActiveState Python version 2.1.1 build 212 import unittest, sys, os, math, time LIMIT=4000000 DISPLAY_AT_END=1 USE_RANDOM=100 # If set, number of keys is approximately LIMIT/USE_RANDOM AUTO_RANDOM=1 if USE_RANDOM and AUTO_RANDOM: USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT))) if USE_RANDOM < 2: USE_RANDOM = 2 ## The format of the value string is ## count|hash|hash...|b ## Where ## count is an 8 byte hexadecimal count of the number of times ## this record has been written. ## hash is the md5 hash of the random value that created this record. ## It is the key for this record. It is appended once for each ## time the record is written (that is, it occurs count times). ## b is 129 '!' ## if USE_RANDOM is set, its value should be >= 2 class BreakDB(unittest.TestCase): def runTest(self): import md5, bsddb, os if USE_RANDOM: import random random.seed() max_key=int(LIMIT / USE_RANDOM) m = md5.new() b = "!" * 129 # small string to write db = bsddb.btopen(self.dbname, 'c') try: self.db = db for count in xrange(1, LIMIT+1): if count % 100==0: print >> sys.stderr, " %10d\r" % (count), if USE_RANDOM: r = random.randrange(0, max_key) m = md5.new(str(r)) key = m.hexdigest() if db.has_key(key): rec = db[key] old_count = int(rec[0:8], 16) should_be = '%08X|%s%s'% (old_count, ((key+'|') *old_count), b) if rec != should_be: self.fail("Mismatched data: db ["+repr(key)+"]="+ repr(db[key])+". Should be "+repr(should_be)) return 1 else: # New record rec = '00000000|'+b old_count = 0 new_count = old_count+1 new_rec = '%08X|%s%s'% (new_count, key, rec[8:], ) db[key] = new_rec else: m.update(str(count)) db[m.digest()] = b try: db.sync() except: pass if DISPLAY_AT_END: rec = db.first() count = 0 while 1: print >> sys.stderr, " count = %6i db[% s]=%s" % ( count, rec[0], rec[1], ) count += 1 try: rec = db.next() except KeyError: break finally: db.close() def unlinkDB(self): import os if os.path.exists(self.dbname): os.unlink(self.dbname) def setUp(self): self.dbname = 'test.db' self.unlinkDB() def tearDown(self): self.db.close() self.unlinkDB() if __name__ == '__main__': runner = unittest.TextTestRunner() runner.run(unittest.TestSuite([BreakDB()])) ---------------------------------------------------------------------- Comment By: Martin D Katz, Ph.D. (drbits) Date: 2002-05-17 01:10 Message: Logged In: YES user_id=276840 I am not sure there is a reason to deprecate bsddb. The btopen format appears to be stable enough for normal work. Maybe 2.3 should change dbhash to use btopen? ---------------------------------------------------------------------- Comment By: Garth T Kidd (gtk) Date: 2002-05-09 05:12 Message: Logged In: YES user_id=59803 Let's not turn a simple patch into something requiring a PEP, compulsory thrashing on comp.lang.python, SleepyCat being willing to change their distribution model, lawyers (to make sure the licences are compatible), and so on. I'd hate it if other people spent the kind of time I did trying to get shelve to work only to find that a known- broken bsddb was causing all the problems, and that a patch was there to gently guide them to gdbm, but it got jammed because of scope-creep. Let's get this one, very simple and necessary (bsddb IS broken) change out of the way, and THEN start negotiating, thrashing, and integrating. :) I firmly believe bsddb3 should be one of the included batteries. Let's do it, but let's guide people away from broken code first. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-08 11:01 Message: Logged In: YES user_id=21627 I'm in favour of this change, but I'd like simultaneously incorporate bsddb3. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470 From noreply@sourceforge.net Tue Jul 2 22:54:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 14:54:41 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 07:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-02 17:54 Message: Logged In: YES user_id=31435 So what did it get, and what did it expect? I.e., same stuff all over again. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:41 Message: Logged In: YES user_id=119770 Ok, well, testing worked fine on the test file I created but running against Lib/test/test_zipfile.py gives: Traceback (most recent call last): File "test_zipfile.py", line 35, in ? zipTest(file, zipfile.ZIP_STORED, writtenData) File "test_zipfile.py", line 16, in zipTest readData2 = zip.read(srcname) File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py", line 351, in read raise BadZipfile, "Bad CRC-32 for file %s" % name zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 16:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 10:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 23:17:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 15:17:37 -0700 Subject: [Patches] [ python-Patches-553108 ] Deprecate bsddb Message-ID: Patches item #553108, was opened at 2002-05-06 22:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470 Category: Modules Group: Python 2.3 Status: Open Resolution: Accepted Priority: 5 Submitted By: Garth T Kidd (gtk) Assigned to: Skip Montanaro (montanaro) Summary: Deprecate bsddb Initial Comment: Large numbers of inserts break bsddb, as first discovered in Python 1.5 (bug 408271). According to Barry Warsaw, "trying to get the bsddb module that comes with Python to work is a hopeless cause." If it's broken, let's discourage people from using it. In particular, let's ensure that people importing shelve or anydbm don't end up using it by default. The submitted patch adds a DeprecationWarning to the bsddb module and removes bsddb from the list of db module candidates in anydbm. ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-07-02 17:17 Message: Logged In: YES user_id=44345 Jack, Sorry to here you're having trouble. Alas, my MacOS X system is with my wife at the moment, so I can't dig into the problem much. Can you provide me with some background info? If you can send me your copy of ndbm.h (I doubt it's using Berkeley DB) and figure out which library dbm_open resides in, that would be great. Also, can you provide me with the output of the build process so I can see just what errors are being generated? Skip ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-07-02 16:52 Message: Logged In: YES user_id=45365 Skip, I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc). Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-14 15:32 Message: Logged In: YES user_id=44345 Implemented in setup.py 1.93 README 1.147 configure 1.315 configure.in 1.325 pyconfig.h.in 1.42 Modules/dbmmodule 2.30 ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-14 02:16 Message: Logged In: YES user_id=21627 The patch looks good, please apply it. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-13 22:33 Message: Logged In: YES user_id=44345 a couple more tweaks... I forgot to include dbmmodule.c in previous patches. This version of the patch also includes a modified README file that adds a section about building the bsddb and dbm modules. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-13 02:35 Message: Logged In: YES user_id=44345 Here's an updated patch. It's different in a couple ways: * support for Berkeley DB 4.x was added. You will need to configure iBerkdb with the 1.85 compatibility stuff. * I cleaned up the dbm build code a bit. * I added a diff for the configure file for people who don't have autoconf handy. Skip ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-11 11:09 Message: Logged In: YES user_id=44345 I think deprecating bsddb is too drastic. In the first place, the problems you refer to are in the underlying Berkeley DB library, not in the bsddb code itself. In the second place, later versions of the library fix the problem. The attached patch attempts to modify setup.py and configure.in to solve the problem. It does a couple things differently than the current CVS version: 1. It only searches for versions 2 and 3 of the Berkeley DB library by default. People who know what they are doing can uncomment the information relevant to version 1. 2. It moves all the checking code into setup.py. The header file checks in configure.in were deleted. 3. The ndbm lookalike stuff for the dbm module is done differently. This has not really been tested yet. I anticipate further changes will be necessary with this code. I'm sure it's not perfect. Please give it a try and let me know how it works for you. All that said, I think a better migration path is to replace the current module with the bsddb3/pybsddb stuff. I think that would effectively restrict you to versions 3 or 4 of the underlying Berkeley DB library, so it probably couldn't be done with impunity. Skip ---------------------------------------------------------------------- Comment By: Martin D Katz, Ph.D. (drbits) Date: 2002-05-20 13:14 Message: Logged In: YES user_id=276840 #!/bin/python # Test for Python bug report 553108 # This program shows that bsddb seems to work reliably with # the btopen database format. # This is based on the test program # in the discussion of bug report 445862 # This has been enhanced to perform read, modify, # write operations in random order. # This is only one of several tests I performed. # This included 4,000,000 read, modify, write operations to 90,909 records # (an average of 44,000 writes for each record). # Note: This program took approximately 50 hours to run # on my 930MHz Pentium 3 under Windows 2000 with # ActiveState Python version 2.1.1 build 212 import unittest, sys, os, math, time LIMIT=4000000 DISPLAY_AT_END=1 USE_RANDOM=100 # If set, number of keys is approximately LIMIT/USE_RANDOM AUTO_RANDOM=1 if USE_RANDOM and AUTO_RANDOM: USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT))) if USE_RANDOM < 2: USE_RANDOM = 2 ## The format of the value string is ## count|hash|hash...|b ## Where ## count is an 8 byte hexadecimal count of the number of times ## this record has been written. ## hash is the md5 hash of the random value that created this record. ## It is the key for this record. It is appended once for each ## time the record is written (that is, it occurs count times). ## b is 129 '!' ## if USE_RANDOM is set, its value should be >= 2 class BreakDB(unittest.TestCase): def runTest(self): import md5, bsddb, os if USE_RANDOM: import random random.seed() max_key=int(LIMIT / USE_RANDOM) m = md5.new() b = "!" * 129 # small string to write db = bsddb.btopen(self.dbname, 'c') try: self.db = db for count in xrange(1, LIMIT+1): if count % 100==0: print >> sys.stderr, " %10d\r" % (count), if USE_RANDOM: r = random.randrange(0, max_key) m = md5.new(str(r)) key = m.hexdigest() if db.has_key(key): rec = db[key] old_count = int(rec[0:8], 16) should_be = '%08X|%s%s'% (old_count, ((key+'|') *old_count), b) if rec != should_be: self.fail("Mismatched data: db ["+repr(key)+"]="+ repr(db[key])+". Should be "+repr(should_be)) return 1 else: # New record rec = '00000000|'+b old_count = 0 new_count = old_count+1 new_rec = '%08X|%s%s'% (new_count, key, rec[8:], ) db[key] = new_rec else: m.update(str(count)) db[m.digest()] = b try: db.sync() except: pass if DISPLAY_AT_END: rec = db.first() count = 0 while 1: print >> sys.stderr, " count = %6i db[% s]=%s" % ( count, rec[0], rec[1], ) count += 1 try: rec = db.next() except KeyError: break finally: db.close() def unlinkDB(self): import os if os.path.exists(self.dbname): os.unlink(self.dbname) def setUp(self): self.dbname = 'test.db' self.unlinkDB() def tearDown(self): self.db.close() self.unlinkDB() if __name__ == '__main__': runner = unittest.TextTestRunner() runner.run(unittest.TestSuite([BreakDB()])) ---------------------------------------------------------------------- Comment By: Martin D Katz, Ph.D. (drbits) Date: 2002-05-16 18:10 Message: Logged In: YES user_id=276840 I am not sure there is a reason to deprecate bsddb. The btopen format appears to be stable enough for normal work. Maybe 2.3 should change dbhash to use btopen? ---------------------------------------------------------------------- Comment By: Garth T Kidd (gtk) Date: 2002-05-08 22:12 Message: Logged In: YES user_id=59803 Let's not turn a simple patch into something requiring a PEP, compulsory thrashing on comp.lang.python, SleepyCat being willing to change their distribution model, lawyers (to make sure the licences are compatible), and so on. I'd hate it if other people spent the kind of time I did trying to get shelve to work only to find that a known- broken bsddb was causing all the problems, and that a patch was there to gently guide them to gdbm, but it got jammed because of scope-creep. Let's get this one, very simple and necessary (bsddb IS broken) change out of the way, and THEN start negotiating, thrashing, and integrating. :) I firmly believe bsddb3 should be one of the included batteries. Let's do it, but let's guide people away from broken code first. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-08 04:01 Message: Logged In: YES user_id=21627 I'm in favour of this change, but I'd like simultaneously incorporate bsddb3. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470 From noreply@sourceforge.net Tue Jul 2 23:25:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 15:25:09 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 07:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-02 18:25 Message: Logged In: YES user_id=31435 Please try again. New patch tries to force the entry conditions in crc32(), as well as the return value. Modules/binascii.c; new revision: 2.36 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 17:54 Message: Logged In: YES user_id=31435 So what did it get, and what did it expect? I.e., same stuff all over again. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:41 Message: Logged In: YES user_id=119770 Ok, well, testing worked fine on the test file I created but running against Lib/test/test_zipfile.py gives: Traceback (most recent call last): File "test_zipfile.py", line 35, in ? zipTest(file, zipfile.ZIP_STORED, writtenData) File "test_zipfile.py", line 16, in zipTest readData2 = zip.read(srcname) File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py", line 351, in read raise BadZipfile, "Bad CRC-32 for file %s" % name zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 16:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 10:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Tue Jul 2 23:41:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 15:41:03 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 14:41 Message: Logged In: YES user_id=119770 Ok, hang on. I'm doing a clean build to make sure I wasn't using anything from an old install. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 14:25 Message: Logged In: YES user_id=31435 Please try again. New patch tries to force the entry conditions in crc32(), as well as the return value. Modules/binascii.c; new revision: 2.36 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 13:54 Message: Logged In: YES user_id=31435 So what did it get, and what did it expect? I.e., same stuff all over again. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:41 Message: Logged In: YES user_id=119770 Ok, well, testing worked fine on the test file I created but running against Lib/test/test_zipfile.py gives: Traceback (most recent call last): File "test_zipfile.py", line 35, in ? zipTest(file, zipfile.ZIP_STORED, writtenData) File "test_zipfile.py", line 16, in zipTest readData2 = zip.read(srcname) File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py", line 351, in read raise BadZipfile, "Bad CRC-32 for file %s" % name zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 12:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Wed Jul 3 02:30:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 18:30:38 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 03:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:30 Message: Logged In: YES user_id=119770 Ok, Modules/binascii.c v2.36 works good! ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 14:41 Message: Logged In: YES user_id=119770 Ok, hang on. I'm doing a clean build to make sure I wasn't using anything from an old install. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 14:25 Message: Logged In: YES user_id=31435 Please try again. New patch tries to force the entry conditions in crc32(), as well as the return value. Modules/binascii.c; new revision: 2.36 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 13:54 Message: Logged In: YES user_id=31435 So what did it get, and what did it expect? I.e., same stuff all over again. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:41 Message: Logged In: YES user_id=119770 Ok, well, testing worked fine on the test file I created but running against Lib/test/test_zipfile.py gives: Traceback (most recent call last): File "test_zipfile.py", line 35, in ? zipTest(file, zipfile.ZIP_STORED, writtenData) File "test_zipfile.py", line 16, in zipTest readData2 = zip.read(srcname) File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py", line 351, in read raise BadZipfile, "Bad CRC-32 for file %s" % name zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 13:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 12:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 07:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 06:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Wed Jul 3 02:58:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 18:58:09 -0700 Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8 Message-ID: Patches item #576327, was opened at 2002-07-02 07:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: The Written Word (Albert Chin) (tww-china) Assigned to: Tim Peters (tim_one) Summary: zipfile when sizeof(long) == 8 Initial Comment: This bug also applies to Python 2.0.x and 2.1.x (most likely every version). When sizeof (long) == 8, like on Tru64 UNIX, zipfile.testzip () fails due to a CRC error. The problem is that in Lib/zipfile.py: crc = binascii.crc32(bytes) converts the 32-bit binascii.crc32() return value to a 64-bit value (crc). We need to force crc to remain a 32-bit value. Attached is a patch though maybe someone else can think of something better. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-02 21:58 Message: Logged In: YES user_id=31435 Thanks for your help, Albert! While I started my ill-spent computer career on 64-bit Crays, you're the only 64-bit platform I have anymore . This report is Closed. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 21:30 Message: Logged In: YES user_id=119770 Ok, Modules/binascii.c v2.36 works good! ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 18:41 Message: Logged In: YES user_id=119770 Ok, hang on. I'm doing a clean build to make sure I wasn't using anything from an old install. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 18:25 Message: Logged In: YES user_id=31435 Please try again. New patch tries to force the entry conditions in crc32(), as well as the return value. Modules/binascii.c; new revision: 2.36 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 17:54 Message: Logged In: YES user_id=31435 So what did it get, and what did it expect? I.e., same stuff all over again. ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:41 Message: Logged In: YES user_id=119770 Ok, well, testing worked fine on the test file I created but running against Lib/test/test_zipfile.py gives: Traceback (most recent call last): File "test_zipfile.py", line 35, in ? zipTest(file, zipfile.ZIP_STORED, writtenData) File "test_zipfile.py", line 16, in zipTest readData2 = zip.read(srcname) File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py", line 351, in read raise BadZipfile, "Bad CRC-32 for file %s" % name zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 17:01 Message: Logged In: YES user_id=119770 Tested the new Modules/binascii.c against 2.2.1 on Tru64 4.0D, 5.1, and HP-UX 11i and it works. Thanks! ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 16:20 Message: Logged In: YES user_id=31435 No, I don't have access to a 64-bit box. Do you have access to CVS Python? If so, please try again. I patched it to try to make binascii.crc32() return the same result across platforms. Modules/binascii.c; new revision: 2.35 ---------------------------------------------------------------------- Comment By: The Written Word (Albert Chin) (tww-china) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=119770 >From zipfile.py: ... structCentralDir = "<4s4B4H3l5H2l" ... def _RealGetContents(self): ... centdir = fp.read(46) total = total + 46 if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory" centdir = struct.unpack(structCentralDir, centdir) When a zipfile is created, the CRC is written with: def write(self, filename, arcname=None, compress_type=None): ... self.fp.write(struct.pack(">> import zipfile >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w') >>> zip.write ('/vmuniz', 'vmunix') >>> zip.close () >>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r') >>> zip.testzip() 2226205591 -2068761705 I addes some debugging statements to zipfile.read(). The first number is the output of binascii.crc32() while the second is the output of zinfo.CRC (the CRC value in the zipfile header for 'vmuniz' in /tmp/a.zip). Would binascii.crc32() *ever* return a negative number or does it return an unsigned type? Looking at the source to Modules/binascii.c, crc is an unsigned long but the value returned is signed long. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-02 10:44 Message: Logged In: YES user_id=31435 I believe you're having a problem, but I can't tell what it is. Exactly how does zipfile.testzip() fail? What did it get and what did it expect? It's not possible to "force crc to remain a 32-bit value" on a 64- bit box with sizeof(long)==8 -- Python doesn't have any 32-bit type on such a box. So it seems most likely that some 32- bit value either is or isn't getting sign-extended when this fails, but I can't tell from the report which of the disagreeing values that may be, or which it *should* be. IOW, we need more info about how this fails. If you're hacking the result of binascii.crc32() and calling that "a fix", chances seem high that the correct fix lies in changing what crc32() returns. But not yet enough info here to say. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470 From noreply@sourceforge.net Wed Jul 3 03:47:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 02 Jul 2002 19:47:17 -0700 Subject: [Patches] [ python-Patches-574532 ] Update freeze to use zlib 1.1.4 Message-ID: Patches item #574532, was opened at 2002-06-27 21:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470 Category: Demos and tools Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Lawrence Hudson (lhudson) Assigned to: Nobody/Anonymous (nobody) Summary: Update freeze to use zlib 1.1.4 Initial Comment: freeze currently looks for zlib 1.1.3. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-03 12:47 Message: Logged In: YES user_id=14198 Checked in. /cvsroot/python/python/dist/src/Tools/freeze/extensions_win32.ini,v <-- extensions_win32.ini new revision: 1.7; previous revision: 1.6 ---------------------------------------------------------------------- Comment By: Lawrence Hudson (lhudson) Date: 2002-07-01 18:59 Message: Logged In: YES user_id=82888 D'Oh! Sorry about that. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-28 11:14 Message: Logged In: YES user_id=14198 there is no patch attached here that I can see! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470 From noreply@sourceforge.net Wed Jul 3 16:57:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 03 Jul 2002 08:57:21 -0700 Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS Message-ID: Patches item #577031, was opened at 2002-07-03 11:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Remove PyArg_Parse() and METH_OLDARGS Initial Comment: This patch removes more PyArg_Parse() and METH_OLDARGS which are deprecated. I've tested in select and string, but want to make sure there's nothing else I'm missing. I also have a huge change to glmodule, but I can't test that. The diff is attached. Let me know if I should check in glmodule or leave it alone. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 From noreply@sourceforge.net Wed Jul 3 16:59:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 03 Jul 2002 08:59:15 -0700 Subject: [Patches] [ python-Patches-561244 ] Micro optimizations Message-ID: Patches item #561244, was opened at 2002-05-27 17:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561244&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Neal Norwitz (nnorwitz) Summary: Micro optimizations Initial Comment: This is stuff I've had sitting around for a while. I was attempting to improve performance in some paths. * Most of the changes are from a loop -> memset. * intobject changes are to initialize small ints at startup, so smallints don't have to be checked for each new int * other misc very small clean-ups Please review and test to see if there are any problems. Also feedback whether this improves performance for various platforms (tested on Linux) or if this patch is even worth it. Files modified are: Include/intobject.h Python/{ceval,pythonrun}.c Objects/{tuple,list,int,frame,}object.c All changes are independant, except for the int changes which affect: Include/intobject.h, Python/pythonrun.c, and Objects/intobject.c. It may also be useful to define the small negative int (NSMALLNEGINTS) to be 5 or so instead of 1. There are several uses -2, -3, ... in the standard library. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-03 11:59 Message: Logged In: YES user_id=33168 Checked in the memset()s in: {list,tuple}object.c and _sre.c. object.c 2.178 Still have to do int and frame. I've cleaned up int so that if there is an init failure, a fatal error is raised similar to other initializations. I will get around to checking that in. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-06-05 17:43 Message: Logged In: YES user_id=6380 I like all of these, even PyInt_Init(). Go for it. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-05-31 18:11 Message: Logged In: YES user_id=80475 Wow, you plowed through a lot of code! Two sets of optimizations look worthwhile, the memsets() and the XINCREFs to INCREFS. Probably the fastlocals substitutions should be done also, but more for beauty and clarity than speed. I checked those three categories of changes on my machine. They compile fine, pass the standard regression tests and checkout okay on my personal, realcode testfarm. I don't think the PyInt_Init() addition buys us anything. The register and macro tweaks may cost more in review time and potential errors than they could ever save in cumulative computer time. Recommend you get these in before someone changes the codebase. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561244&group_id=5470 From noreply@sourceforge.net Thu Jul 4 02:35:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 03 Jul 2002 18:35:46 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 15:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-04 11:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 20:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-02 04:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 13:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 15:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Thu Jul 4 13:28:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 04 Jul 2002 05:28:14 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 05:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-07-04 12:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-04 01:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 10:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-02 01:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 18:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 03:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 05:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Fri Jul 5 06:31:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 04 Jul 2002 22:31:36 -0700 Subject: [Patches] [ python-Patches-553702 ] Cygwin make install patch Message-ID: Patches item #553702, was opened at 2002-05-08 14:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470 Category: Build Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Jason Tishler (jlt63) >Assigned to: Jason Tishler (jlt63) Summary: Cygwin make install patch Initial Comment: This patch fixes make install for Cygwin. Specifically, it reverts to the previous behavior: o install libpython$(VERSION)$(SO) in $(BINDIR) o install $(LDLIBRARY) in $(LIBPL) It also begins to remove Cygwin's dependency on $(DLLLIBRARY) which I hope to take advantage of when I attempt to make Cygwin as similar as possible to the other Unix platforms (in other patches). I tested this patch under Red Hat Linux 7.1 without any ill effects. BTW, I'm not the happiest using the following test for Cygwin: test "$(SO)" = .dll I'm willing to update the patch to use: case "$(MACHDEP)" in cygwin* instead, but IMO that will look uglier. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:31 Message: Logged In: YES user_id=21627 I think I misinterpreted your patch. It is fine; please apply it. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2002-06-27 18:25 Message: Logged In: YES user_id=86216 Sorry for sluggish response time... Under Cygwin, my patch does the following: make altbininstall: /usr/bin/install -c -m 555 libpython2.3.dll /usr/bin make libainstall: /usr/bin/install -c -m 644 libpython2.3.dll.a /usr/lib/python2.3/config So, I am installing the shared library during altbininstall and the import library during libainstall. Isn't this what you were asking for in your previous message? Or, do you want me to install both files during altbininstall? I'm confused. Please clarify. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-06 11:49 Message: Logged In: YES user_id=21627 On Unix, if a shared libpython is created, it is installed as part of altbininstall, not as part of libainstall. I feel that pythonxy.dll is not really a library, but a binary - quite unlike libpythonxy.a (which is more close to the import library). So I feel that this patch would better be incorporated into altbininstall. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2002-06-04 17:17 Message: Logged In: YES user_id=86216 Please review when you get a chance, thanks. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2002-05-22 18:30 Message: Logged In: YES user_id=86216 Can I commit this one? Note that make install is busted under Cygwin without this patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470 From noreply@sourceforge.net Fri Jul 5 06:45:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 04 Jul 2002 22:45:21 -0700 Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS Message-ID: Patches item #577031, was opened at 2002-07-03 17:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Remove PyArg_Parse() and METH_OLDARGS Initial Comment: This patch removes more PyArg_Parse() and METH_OLDARGS which are deprecated. I've tested in select and string, but want to make sure there's nothing else I'm missing. I also have a huge change to glmodule, but I can't test that. The diff is attached. Let me know if I should check in glmodule or leave it alone. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:45 Message: Logged In: YES user_id=21627 The changes look good, except for the ones that change parsing of "s" to PyString_Check: that means to lose support for Unicode. For some of these methods, that may be acceptable, but that would need documentation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 From noreply@sourceforge.net Fri Jul 5 06:47:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 04 Jul 2002 22:47:52 -0700 Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr Message-ID: Patches item #576458, was opened at 2002-07-02 18:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Extend PyErr_SetFromWindowsErr Initial Comment: PyErr_SetFromWindowsErr and PyErr_SetFromWindowsErrWithFilename can only raise PyExc_WindowsError. This patch introduces variants of these functions taking an additional PyObject* parameter, which allows to specify the type of the exception to raise. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:47 Message: Logged In: YES user_id=21627 If this is meant to be used by extension modules, it should be documented. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-07-02 18:13 Message: Logged In: YES user_id=11105 Patch for the header file was missing... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 From noreply@sourceforge.net Fri Jul 5 07:45:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 04 Jul 2002 23:45:22 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 15:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-05 16:45 Message: Logged In: YES user_id=14198 ok - thanks! Attaching a new patch that works correctly with autheader. I'm gunna need help checking this in tho, but one step at a time <0.1 wink> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-04 22:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-04 11:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 20:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-02 04:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 13:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 15:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Fri Jul 5 07:57:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 04 Jul 2002 23:57:50 -0700 Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr Message-ID: Patches item #576458, was opened at 2002-07-02 18:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Extend PyErr_SetFromWindowsErr Initial Comment: PyErr_SetFromWindowsErr and PyErr_SetFromWindowsErrWithFilename can only raise PyExc_WindowsError. This patch introduces variants of these functions taking an additional PyObject* parameter, which allows to specify the type of the exception to raise. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2002-07-05 08:57 Message: Logged In: YES user_id=11105 Sure. Patch uploaded: docpatch.diff ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:47 Message: Logged In: YES user_id=21627 If this is meant to be used by extension modules, it should be documented. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-07-02 18:13 Message: Logged In: YES user_id=11105 Patch for the header file was missing... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 From noreply@sourceforge.net Fri Jul 5 15:36:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 05 Jul 2002 07:36:36 -0700 Subject: [Patches] [ python-Patches-553702 ] Cygwin make install patch Message-ID: Patches item #553702, was opened at 2002-05-08 04:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470 Category: Build Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: Jason Tishler (jlt63) Assigned to: Jason Tishler (jlt63) Summary: Cygwin make install patch Initial Comment: This patch fixes make install for Cygwin. Specifically, it reverts to the previous behavior: o install libpython$(VERSION)$(SO) in $(BINDIR) o install $(LDLIBRARY) in $(LIBPL) It also begins to remove Cygwin's dependency on $(DLLLIBRARY) which I hope to take advantage of when I attempt to make Cygwin as similar as possible to the other Unix platforms (in other patches). I tested this patch under Red Hat Linux 7.1 without any ill effects. BTW, I'm not the happiest using the following test for Cygwin: test "$(SO)" = .dll I'm willing to update the patch to use: case "$(MACHDEP)" in cygwin* instead, but IMO that will look uglier. ---------------------------------------------------------------------- >Comment By: Jason Tishler (jlt63) Date: 2002-07-05 06:36 Message: Logged In: YES user_id=86216 Thanks. I'm on vacation now and will check it in when I return to work next week. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-04 21:31 Message: Logged In: YES user_id=21627 I think I misinterpreted your patch. It is fine; please apply it. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2002-06-27 08:25 Message: Logged In: YES user_id=86216 Sorry for sluggish response time... Under Cygwin, my patch does the following: make altbininstall: /usr/bin/install -c -m 555 libpython2.3.dll /usr/bin make libainstall: /usr/bin/install -c -m 644 libpython2.3.dll.a /usr/lib/python2.3/config So, I am installing the shared library during altbininstall and the import library during libainstall. Isn't this what you were asking for in your previous message? Or, do you want me to install both files during altbininstall? I'm confused. Please clarify. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-06 01:49 Message: Logged In: YES user_id=21627 On Unix, if a shared libpython is created, it is installed as part of altbininstall, not as part of libainstall. I feel that pythonxy.dll is not really a library, but a binary - quite unlike libpythonxy.a (which is more close to the import library). So I feel that this patch would better be incorporated into altbininstall. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2002-06-04 07:17 Message: Logged In: YES user_id=86216 Please review when you get a chance, thanks. ---------------------------------------------------------------------- Comment By: Jason Tishler (jlt63) Date: 2002-05-22 08:30 Message: Logged In: YES user_id=86216 Can I commit this one? Note that make install is busted under Cygwin without this patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470 From noreply@sourceforge.net Fri Jul 5 19:25:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 05 Jul 2002 11:25:01 -0700 Subject: [Patches] [ python-Patches-577875 ] Merge xrange() into slice() Message-ID: Patches item #577875, was opened at 2002-07-05 18:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: Merge xrange() into slice() Initial Comment: Changes from Raymond Hettinger's last version of this patch: 1. Removed #include "rangeobject.h" from Python.h 2. Changed repr to suppress None arguments so it now looks like the old xrange repr. 3. Added .slice(len) method that exposes the functionality of PySlice_GetIndicesEx. Comment in PySlice_GetIndicesEx: /* this is harder to get right than you might think */ :-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470 From noreply@sourceforge.net Fri Jul 5 19:45:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 05 Jul 2002 11:45:05 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 01:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-05 14:45 Message: Logged In: YES user_id=33168 I think Martin checked in the change to drop support for win16, so some of the macros may have changed (MS_WINDOWS, MS_WIN32). Won't all the files which use DL_*PORT (most headers in Include) will have to change? Michael's explanation of autoconf is what I do. Make sure you have version 2.53 though. Let me know if you want me to test on linux. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 02:45 Message: Logged In: YES user_id=14198 ok - thanks! Attaching a new patch that works correctly with autheader. I'm gunna need help checking this in tho, but one step at a time <0.1 wink> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-04 08:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-03 21:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 06:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-01 21:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 14:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-23 23:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 01:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Sat Jul 6 01:41:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 05 Jul 2002 17:41:19 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 15:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-06 10:41 Message: Logged In: YES user_id=14198 My patch is after Martin's so hopefully I have the macros correct (or at least haven't regressed anything of his!) DL_*PORT still exists, but is deprecated. Eventually every header will change, but for now DL_*PORT still works as before. And yes, finding autoconf-2.5.3 for my cygwin and linux platforms is what took 1/2 the time of getting this patch together :) Another report of success on Linux would be great! To date, I have not heard of a single person trying this patch on any platform. Thanks. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-06 04:45 Message: Logged In: YES user_id=33168 I think Martin checked in the change to drop support for win16, so some of the macros may have changed (MS_WINDOWS, MS_WIN32). Won't all the files which use DL_*PORT (most headers in Include) will have to change? Michael's explanation of autoconf is what I do. Make sure you have version 2.53 though. Let me know if you want me to test on linux. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 16:45 Message: Logged In: YES user_id=14198 ok - thanks! Attaching a new patch that works correctly with autheader. I'm gunna need help checking this in tho, but one step at a time <0.1 wink> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-04 22:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-04 11:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 20:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-02 04:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 13:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 15:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Sat Jul 6 15:35:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 06 Jul 2002 07:35:29 -0700 Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning Message-ID: Patches item #576101, was opened at 2002-07-01 19:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: Alternative implementation of interning Initial Comment: An interned string has a flag set indicating that it is interned instead of a pointer to the interned string. This pointer was almost always either NULL or pointing to the same object. The other cases were rare and ineffective as an optimization. This saves an average of 3 bytes per string. Interned strings are no longer immortal. They are automatically destroyed when there are no more references to them except the global dictionary of interned strings. New function (actually a macro) PyString_CheckInterned to check whether a string is interned. There are no more references to ob_sinterned anywhere outside stringobject.c. ---------------------------------------------------------------------- >Comment By: Oren Tirosh (orenti) Date: 2002-07-06 14:35 Message: Logged In: YES user_id=562624 This implementation supports both mortal and immortal interned strings. PyString_InternInPlace creates an immortal interned string for backward compatibility with code that relies on this behavior. PyString_Intern creates a mortal interned string that is deallocated when its refcnt reaches 0. Note that if the string value has been previously interned as immortal this will not make it mortal. Most places in the interpreter were changed to PyString_Intern except those that may be required for compatibility. This version of the patch, like the previous one, disables indirect interning. Is there any evidence that it is still an important optimization for some packages? Make sure you rebuild everything after applying this patch because it modifies the size of string object headers. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-07-02 04:21 Message: Logged In: YES user_id=80475 I like the way you consolidated all of the knowledge about interning into one place. Consider adding an example to the docs of an effective use of interning for optimization. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 From noreply@sourceforge.net Sat Jul 6 16:08:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 06 Jul 2002 08:08:14 -0700 Subject: [Patches] [ python-Patches-527518 ] urllib2.py: fix behavior with proxies Message-ID: Patches item #527518, was opened at 2002-03-08 13:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Chris Lawrence (lordsutch) Assigned to: Moshe Zadka (moshez) Summary: urllib2.py: fix behavior with proxies Initial Comment: The following patch against Python 2.1 fixes some problems with the urllib2 module when used with proxies; in particular, if $http_proxy="http://user:passwd@host:port/" is used. It also generates the correct Host header for proxy requests (some proxies, such as oops, get confused otherwise, despite RFC 2616 section 5.2 which says they are to ignore it in the case of a full URL on the request line). ---------------------------------------------------------------------- >Comment By: Chris Lawrence (lordsutch) Date: 2002-07-06 10:08 Message: Logged In: YES user_id=6757 Moshe: The updated patch seems to be A-OK and fixes the issue in urllib2.py. At some point I'll have to get back to urllib.py. Chris ---------------------------------------------------------------------- Comment By: Moshe Zadka (moshez) Date: 2002-06-18 02:40 Message: Logged In: YES user_id=11645 I've looked at the patch, and it mixes cleanup with fixes. I removed the cleanups parts, since I want an "obviously correct" patch. Attached is a new patch I generated which fixes the two problems: * incorrect quoting of the user/password in the proxy code * bad host headers when using proxies. I am also curious about the logic in the later fix. Can "sel_host" ever be empty? When? Or can we just remove the "or host" stuff? Thanks. ---------------------------------------------------------------------- Comment By: Moshe Zadka (moshez) Date: 2002-06-13 13:04 Message: Logged In: YES user_id=11645 Nope, no reason, except I need to properly test it and check it in, and I won't have time for that until the weekend. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-06-13 12:51 Message: Logged In: YES user_id=31392 This patch vs. CVS HEAD looks good to me. Note that it would be better to get the Host header by upgrading urllib2 to use HTTPConnection instead of HTTP, but that's a much bigger project. Would it be a problem to always send HTTP/1.1 requests -- even to 1.0 servers? Any reason not to check it in Moshe? ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-13 09:24 Message: Logged In: YES user_id=6757 I'll try to make these changes sometime over the next few days; of course, if someone else wants to do it sooner & check it in, they're more than welcome. ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-06-13 04:45 Message: Logged In: YES user_id=9205 I testet the urllib.py patches for 2.1 and 2.2, they work. Some minor quibbles are left: a) the user and/or password may be empty, so your test "if proxypass and proxyuser" is not enough. You should test against "is None". b) in the urllib2 patches, you use unquote() for user and pass, but in the urllib patches you dont. You should use unquote in both modules. c) in urllib2 patch, you use encodestring() without strip() Here is an example that catches the corner cases # http://@host.com (empty user and password) # http://:@host.com (empty user and password) # http://user@host.com (empty password) # http://user:@host.com (empty password) # http://:pass@host.com (empty user) proxyuserpass, host = splithost(host) if proxyuserpass is not None: ....# unquote ....proxyuserpass = unquote(proxyuserpass) ....# add empty password if missing ....if ":" not in proxyuserpass: proxyuserpass += ":" ....# base64 ....proxyuserpass = base64.encodestring(proxyuserpass).strip() ....req.add_header("Proxy-Authorization", "Basic "+proxyuserpass) Greetings, Bastian ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-12 22:17 Message: Logged In: YES user_id=6757 Ok, here's the patch for urllib.py; again, one patch for each of 2.1, 2.2 and CVS HEAD. I also moved the Host header to right after the GET/PUT request line; this should help servers that have multiple virtual hosts handle requests more efficiently. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-12 21:39 Message: Logged In: YES user_id=6757 Ok, I've cleaned up the patch a bit. I've got versions for 2.1, 2.2 and current CVS HEAD; they're all the same substantively, but the 2.2 -> 2.3 jump changed things enough that the 2.2 patch won't apply cleanly to CVS. Note that the first big chunk fixes the proxy authentication problem, while the second chunk fixes the incorrect Host header problem. The changes to the import at the beginning are necessary for either part to work. I'll investigate urllib.py further. It looks like the underlying problem is fixed in CVS HEAD already, but I'll try to confirm after setting up some test code for urllib. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-12 19:54 Message: Logged In: YES user_id=6757 Moshe, Calvin: I'll see about reworking the patch against current CVS and using splituser etc. I can break it up into two bits if you like, too; probably cleaner that way. (Have I mentioned how much I hate fooling with SF.net's BTS... give me debbugs any day :-) Chris ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-06-12 11:41 Message: Logged In: YES user_id=9205 Note that the proxy thing is also a bug in urllib.py. Chris, can you supply a patch for urllib.py too? And I dont like the attached patch because it does not use the splituser and splitpasswd functions already in urllib. I would suggest that you use something like proxyuser, host = splituser(host) if proxyuser is not None: ....proxypass, proxyuser = splitpasswd(proxyuser) ....[base64 encode and add header] Chris, if you are too busy, close this patch and I will open a new bug with a revised patch. So long, Bastian ---------------------------------------------------------------------- Comment By: Moshe Zadka (moshez) Date: 2002-06-11 05:34 Message: Logged In: YES user_id=11645 I want to take a look at this....I'm not thrilled about the patch, especially solving two unrelated problems and all, but I do think there's a real problem, and I'll try to fix it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470 From noreply@sourceforge.net Sat Jul 6 17:08:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 06 Jul 2002 09:08:16 -0700 Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning Message-ID: Patches item #576101, was opened at 2002-07-01 19:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: Alternative implementation of interning Initial Comment: An interned string has a flag set indicating that it is interned instead of a pointer to the interned string. This pointer was almost always either NULL or pointing to the same object. The other cases were rare and ineffective as an optimization. This saves an average of 3 bytes per string. Interned strings are no longer immortal. They are automatically destroyed when there are no more references to them except the global dictionary of interned strings. New function (actually a macro) PyString_CheckInterned to check whether a string is interned. There are no more references to ob_sinterned anywhere outside stringobject.c. ---------------------------------------------------------------------- >Comment By: Oren Tirosh (orenti) Date: 2002-07-06 16:08 Message: Logged In: YES user_id=562624 Oops, forgot to actually attach the patch. Here it is. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2002-07-06 14:35 Message: Logged In: YES user_id=562624 This implementation supports both mortal and immortal interned strings. PyString_InternInPlace creates an immortal interned string for backward compatibility with code that relies on this behavior. PyString_Intern creates a mortal interned string that is deallocated when its refcnt reaches 0. Note that if the string value has been previously interned as immortal this will not make it mortal. Most places in the interpreter were changed to PyString_Intern except those that may be required for compatibility. This version of the patch, like the previous one, disables indirect interning. Is there any evidence that it is still an important optimization for some packages? Make sure you rebuild everything after applying this patch because it modifies the size of string object headers. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-07-02 04:21 Message: Logged In: YES user_id=80475 I like the way you consolidated all of the knowledge about interning into one place. Consider adding an example to the docs of an effective use of interning for optimization. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470 From noreply@sourceforge.net Sun Jul 7 07:21:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 06 Jul 2002 23:21:17 -0700 Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp Message-ID: Patches item #578297, was opened at 2002-07-07 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) Assigned to: Jack Jansen (jackjansen) Summary: fix for problems with test_longexp Initial Comment: The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour. More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue. The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens. The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc. The patch has been tested on OS/2 and FreeBSD: - on OS/2, the patch allows even a system with modest resources to complete test_longexp successfully and without swapping to death; on better resourced machines, the whole regression test is negligibly slower (0-1%) to complete. [gcc-2.8.1 -O2] - on FreeBSD (4.4 tested), test_longexp gains nearly 10%, and completes the whole regression test with a gain of about 2% (test_longexp is good for about 25% of the improvement). [gcc-2.95.3 -O3] Both platforms are neutral, performance wise, running MAL's PyBench 1.0. The patch in its current form is for experimental evaluation, and not intended for integration into the core. If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change. I've assigned this to Jack for review in the context of its performance on the Mac. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 From noreply@sourceforge.net Sun Jul 7 07:41:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 06 Jul 2002 23:41:14 -0700 Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp Message-ID: Patches item #578297, was opened at 2002-07-07 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) Assigned to: Jack Jansen (jackjansen) Summary: fix for problems with test_longexp Initial Comment: The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour. More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue. The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens. The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc. The patch has been tested on OS/2 and FreeBSD: - on OS/2, the patch allows even a system with modest resources to complete test_longexp successfully and without swapping to death; on better resourced machines, the whole regression test is negligibly slower (0-1%) to complete. [gcc-2.8.1 -O2] - on FreeBSD (4.4 tested), test_longexp gains nearly 10%, and completes the whole regression test with a gain of about 2% (test_longexp is good for about 25% of the improvement). [gcc-2.95.3 -O3] Both platforms are neutral, performance wise, running MAL's PyBench 1.0. The patch in its current form is for experimental evaluation, and not intended for integration into the core. If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change. I've assigned this to Jack for review in the context of its performance on the Mac. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-07 16:41 Message: Logged In: YES user_id=250749 Oops. On FreeBSD, test_longexp contributes 15% of the performance gain (not 25%) observed for the regression test with the patch applied. Also, I would expect to make this a platform specific change if its integrated, rather than a general change (unless that it is seen as more appropriate). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 From noreply@sourceforge.net Sun Jul 7 17:58:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 07 Jul 2002 09:58:42 -0700 Subject: [Patches] [ python-Patches-527518 ] urllib2.py: fix behavior with proxies Message-ID: Patches item #527518, was opened at 2002-03-08 19:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Chris Lawrence (lordsutch) Assigned to: Moshe Zadka (moshez) Summary: urllib2.py: fix behavior with proxies Initial Comment: The following patch against Python 2.1 fixes some problems with the urllib2 module when used with proxies; in particular, if $http_proxy="http://user:passwd@host:port/" is used. It also generates the correct Host header for proxy requests (some proxies, such as oops, get confused otherwise, despite RFC 2616 section 5.2 which says they are to ignore it in the case of a full URL on the request line). ---------------------------------------------------------------------- >Comment By: Jeremy Hylton (jhylton) Date: 2002-07-07 16:58 Message: Logged In: YES user_id=31392 fixed in rev. 1.32 of urllib2.py ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-07-06 15:08 Message: Logged In: YES user_id=6757 Moshe: The updated patch seems to be A-OK and fixes the issue in urllib2.py. At some point I'll have to get back to urllib.py. Chris ---------------------------------------------------------------------- Comment By: Moshe Zadka (moshez) Date: 2002-06-18 07:40 Message: Logged In: YES user_id=11645 I've looked at the patch, and it mixes cleanup with fixes. I removed the cleanups parts, since I want an "obviously correct" patch. Attached is a new patch I generated which fixes the two problems: * incorrect quoting of the user/password in the proxy code * bad host headers when using proxies. I am also curious about the logic in the later fix. Can "sel_host" ever be empty? When? Or can we just remove the "or host" stuff? Thanks. ---------------------------------------------------------------------- Comment By: Moshe Zadka (moshez) Date: 2002-06-13 18:04 Message: Logged In: YES user_id=11645 Nope, no reason, except I need to properly test it and check it in, and I won't have time for that until the weekend. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-06-13 17:51 Message: Logged In: YES user_id=31392 This patch vs. CVS HEAD looks good to me. Note that it would be better to get the Host header by upgrading urllib2 to use HTTPConnection instead of HTTP, but that's a much bigger project. Would it be a problem to always send HTTP/1.1 requests -- even to 1.0 servers? Any reason not to check it in Moshe? ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-13 14:24 Message: Logged In: YES user_id=6757 I'll try to make these changes sometime over the next few days; of course, if someone else wants to do it sooner & check it in, they're more than welcome. ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-06-13 09:45 Message: Logged In: YES user_id=9205 I testet the urllib.py patches for 2.1 and 2.2, they work. Some minor quibbles are left: a) the user and/or password may be empty, so your test "if proxypass and proxyuser" is not enough. You should test against "is None". b) in the urllib2 patches, you use unquote() for user and pass, but in the urllib patches you dont. You should use unquote in both modules. c) in urllib2 patch, you use encodestring() without strip() Here is an example that catches the corner cases # http://@host.com (empty user and password) # http://:@host.com (empty user and password) # http://user@host.com (empty password) # http://user:@host.com (empty password) # http://:pass@host.com (empty user) proxyuserpass, host = splithost(host) if proxyuserpass is not None: ....# unquote ....proxyuserpass = unquote(proxyuserpass) ....# add empty password if missing ....if ":" not in proxyuserpass: proxyuserpass += ":" ....# base64 ....proxyuserpass = base64.encodestring(proxyuserpass).strip() ....req.add_header("Proxy-Authorization", "Basic "+proxyuserpass) Greetings, Bastian ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-13 03:17 Message: Logged In: YES user_id=6757 Ok, here's the patch for urllib.py; again, one patch for each of 2.1, 2.2 and CVS HEAD. I also moved the Host header to right after the GET/PUT request line; this should help servers that have multiple virtual hosts handle requests more efficiently. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-13 02:39 Message: Logged In: YES user_id=6757 Ok, I've cleaned up the patch a bit. I've got versions for 2.1, 2.2 and current CVS HEAD; they're all the same substantively, but the 2.2 -> 2.3 jump changed things enough that the 2.2 patch won't apply cleanly to CVS. Note that the first big chunk fixes the proxy authentication problem, while the second chunk fixes the incorrect Host header problem. The changes to the import at the beginning are necessary for either part to work. I'll investigate urllib.py further. It looks like the underlying problem is fixed in CVS HEAD already, but I'll try to confirm after setting up some test code for urllib. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-06-13 00:54 Message: Logged In: YES user_id=6757 Moshe, Calvin: I'll see about reworking the patch against current CVS and using splituser etc. I can break it up into two bits if you like, too; probably cleaner that way. (Have I mentioned how much I hate fooling with SF.net's BTS... give me debbugs any day :-) Chris ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-06-12 16:41 Message: Logged In: YES user_id=9205 Note that the proxy thing is also a bug in urllib.py. Chris, can you supply a patch for urllib.py too? And I dont like the attached patch because it does not use the splituser and splitpasswd functions already in urllib. I would suggest that you use something like proxyuser, host = splituser(host) if proxyuser is not None: ....proxypass, proxyuser = splitpasswd(proxyuser) ....[base64 encode and add header] Chris, if you are too busy, close this patch and I will open a new bug with a revised patch. So long, Bastian ---------------------------------------------------------------------- Comment By: Moshe Zadka (moshez) Date: 2002-06-11 10:34 Message: Logged In: YES user_id=11645 I want to take a look at this....I'm not thrilled about the patch, especially solving two unrelated problems and all, but I do think there's a real problem, and I'll try to fix it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470 From noreply@sourceforge.net Sun Jul 7 22:24:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 07 Jul 2002 14:24:54 -0700 Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp Message-ID: Patches item #578297, was opened at 2002-07-07 08:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) Assigned to: Jack Jansen (jackjansen) Summary: fix for problems with test_longexp Initial Comment: The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour. More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue. The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens. The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc. The patch has been tested on OS/2 and FreeBSD: - on OS/2, the patch allows even a system with modest resources to complete test_longexp successfully and without swapping to death; on better resourced machines, the whole regression test is negligibly slower (0-1%) to complete. [gcc-2.8.1 -O2] - on FreeBSD (4.4 tested), test_longexp gains nearly 10%, and completes the whole regression test with a gain of about 2% (test_longexp is good for about 25% of the improvement). [gcc-2.95.3 -O3] Both platforms are neutral, performance wise, running MAL's PyBench 1.0. The patch in its current form is for experimental evaluation, and not intended for integration into the core. If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change. I've assigned this to Jack for review in the context of its performance on the Mac. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-07-07 23:24 Message: Logged In: YES user_id=45365 Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem. The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-07 08:41 Message: Logged In: YES user_id=250749 Oops. On FreeBSD, test_longexp contributes 15% of the performance gain (not 25%) observed for the regression test with the patch applied. Also, I would expect to make this a platform specific change if its integrated, rather than a general change (unless that it is seen as more appropriate). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 From noreply@sourceforge.net Mon Jul 8 01:50:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 07 Jul 2002 17:50:41 -0700 Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation Message-ID: Patches item #578494, was opened at 2002-07-08 00:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 282 Implementation Initial Comment: The attached file implements PEP282. The file logging- 0.4.6.tar.gz is the entire distribution including setup/install, test/example scripts, and TeX documentation. The file logging.py (within the .tar.gz) is all that is needed to implement the PEP. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 From noreply@sourceforge.net Mon Jul 8 01:56:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 07 Jul 2002 17:56:03 -0700 Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation Message-ID: Patches item #578494, was opened at 2002-07-08 00:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) >Assigned to: Mark Hammond (mhammond) Summary: PEP 282 Implementation Initial Comment: The attached file implements PEP282. The file logging- 0.4.6.tar.gz is the entire distribution including setup/install, test/example scripts, and TeX documentation. The file logging.py (within the .tar.gz) is all that is needed to implement the PEP. ---------------------------------------------------------------------- >Comment By: Vinay Sajip (vsajip) Date: 2002-07-08 00:56 Message: Logged In: YES user_id=308438 Added just the logging.py file to make it easier to review. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 From noreply@sourceforge.net Mon Jul 8 07:38:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 07 Jul 2002 23:38:55 -0700 Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp Message-ID: Patches item #578297, was opened at 2002-07-07 02:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) Assigned to: Jack Jansen (jackjansen) Summary: fix for problems with test_longexp Initial Comment: The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour. More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue. The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens. The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc. The patch has been tested on OS/2 and FreeBSD: - on OS/2, the patch allows even a system with modest resources to complete test_longexp successfully and without swapping to death; on better resourced machines, the whole regression test is negligibly slower (0-1%) to complete. [gcc-2.8.1 -O2] - on FreeBSD (4.4 tested), test_longexp gains nearly 10%, and completes the whole regression test with a gain of about 2% (test_longexp is good for about 25% of the improvement). [gcc-2.95.3 -O3] Both platforms are neutral, performance wise, running MAL's PyBench 1.0. The patch in its current form is for experimental evaluation, and not intended for integration into the core. If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change. I've assigned this to Jack for review in the context of its performance on the Mac. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-08 02:38 Message: Logged In: YES user_id=31435 Jack, please do a cvs update and try this again. I checked in changes to PyNode_AddChild() that I expect will cure your particular woes here. Andrew, PyMalloc was designed for oodles of small allocations. Feel encouraged to write a patch to change the compiler to use PyObject_{Malloc, Realloc, Free} instead. Then it will automatically exploit PyMalloc when the latter is enabled. Note that the regression test suite incorporates random numbers in several tests, and in ways that can affect runtime. Small differences in aggregate test suite runtime are meaningless because of this. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-07-07 17:24 Message: Logged In: YES user_id=45365 Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem. The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-07 02:41 Message: Logged In: YES user_id=250749 Oops. On FreeBSD, test_longexp contributes 15% of the performance gain (not 25%) observed for the regression test with the patch applied. Also, I would expect to make this a platform specific change if its integrated, rather than a general change (unless that it is seen as more appropriate). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 From noreply@sourceforge.net Mon Jul 8 11:09:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 03:09:50 -0700 Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp Message-ID: Patches item #578297, was opened at 2002-07-07 08:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) >Assigned to: Andrew I MacIntyre (aimacintyre) Summary: fix for problems with test_longexp Initial Comment: The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour. More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue. The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens. The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc. The patch has been tested on OS/2 and FreeBSD: - on OS/2, the patch allows even a system with modest resources to complete test_longexp successfully and without swapping to death; on better resourced machines, the whole regression test is negligibly slower (0-1%) to complete. [gcc-2.8.1 -O2] - on FreeBSD (4.4 tested), test_longexp gains nearly 10%, and completes the whole regression test with a gain of about 2% (test_longexp is good for about 25% of the improvement). [gcc-2.95.3 -O3] Both platforms are neutral, performance wise, running MAL's PyBench 1.0. The patch in its current form is for experimental evaluation, and not intended for integration into the core. If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change. I've assigned this to Jack for review in the context of its performance on the Mac. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-07-08 12:09 Message: Logged In: YES user_id=45365 With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch. Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-08 08:38 Message: Logged In: YES user_id=31435 Jack, please do a cvs update and try this again. I checked in changes to PyNode_AddChild() that I expect will cure your particular woes here. Andrew, PyMalloc was designed for oodles of small allocations. Feel encouraged to write a patch to change the compiler to use PyObject_{Malloc, Realloc, Free} instead. Then it will automatically exploit PyMalloc when the latter is enabled. Note that the regression test suite incorporates random numbers in several tests, and in ways that can affect runtime. Small differences in aggregate test suite runtime are meaningless because of this. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-07-07 23:24 Message: Logged In: YES user_id=45365 Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem. The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-07 08:41 Message: Logged In: YES user_id=250749 Oops. On FreeBSD, test_longexp contributes 15% of the performance gain (not 25%) observed for the regression test with the patch applied. Also, I would expect to make this a platform specific change if its integrated, rather than a general change (unless that it is seen as more appropriate). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 From noreply@sourceforge.net Mon Jul 8 14:40:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 06:40:59 -0700 Subject: [Patches] [ python-Patches-578667 ] Put IDE scripts in ~/Library Message-ID: Patches item #578667, was opened at 2002-07-08 15:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 Category: Macintosh Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Just van Rossum (jvr) Summary: Put IDE scripts in ~/Library Initial Comment: Just, here's a patch that was part of a larger set and this one was unrelated to the rest(unfortunately I've forgotten who sent it). The patch moves the IDE scripts folder to ~/Library when running on OSX. This is a good idea, because it allows people to have their own private set of IDE scripts, even if a sysadmin has installed Python. But: the patch as-is is probably not good enough, as there is no place for system-wide scripts anymore. (Scripts will also be shared between MacPython IDE and MachoPython IDE, which is also nice) You may want to look at providing two scripts folders, one in the normal location (i.e. somewhere in the Python tree) and one in ~/Library. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 From noreply@sourceforge.net Mon Jul 8 14:52:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 06:52:08 -0700 Subject: [Patches] [ python-Patches-560311 ] os.uname() on Darwin space in machine Message-ID: Patches item #560311, was opened at 2002-05-24 22:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560311&group_id=5470 Category: Distutils and setup.py Group: Python 2.2.x >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: Tim Carlson (timcarlson) >Assigned to: Jack Jansen (jackjansen) Summary: os.uname() on Darwin space in machine Initial Comment: os.uname() on Darwin (Mac OS X) returns a string for "machine" of "Power MacIntosh" which can cause problems. Getting rid of the space might be a good thing ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-07-08 15:52 Message: Logged In: YES user_id=45365 os.uname() is simply a wrapper around the C library function of te same name. It returns "Power Macintosh" as the machine type. For reasons I don't understand the C interface doesn't allow you to get at the "generic processor type" that is returned by "uname -p". This would probably be more useful (as the value is "powerpc"). But then, Linux gets it wrong, and returns "i686" for machine name and "unknown" for processor type, exactly the wrong way around. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-27 15:31 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560311&group_id=5470 From noreply@sourceforge.net Mon Jul 8 14:57:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 06:57:04 -0700 Subject: [Patches] [ python-Patches-552161 ] Py_AddPendingCall doesn't unlock on fail Message-ID: Patches item #552161, was opened at 2002-05-04 05:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552161&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Daniel Dunbar (danieldunbar) >Assigned to: Guido van Rossum (gvanrossum) Summary: Py_AddPendingCall doesn't unlock on fail Initial Comment: ceval.c:Py_AddPendingCall doesn't unlock if it fails because the queue is full. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-07-08 15:57 Message: Logged In: YES user_id=45365 I came across this one when browsing through the patches, it seems to have caught noones attention yet. Assigning it to Guido as he wrote the addpendig stuff (the patch looks benign to me). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552161&group_id=5470 From noreply@sourceforge.net Mon Jul 8 15:25:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 07:25:19 -0700 Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme Message-ID: Patches item #578688, was opened at 2002-07-08 18:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stepan Koltsov (yozh) Assigned to: Nobody/Anonymous (nobody) Summary: incompatible, but nice strings improveme Initial Comment: This patch changes interpretation of multiline strings (desn't matter, single, double quoted (when NL escaped with backslash), triple quoted). After applying this patch, first: first charachter after opening quote is ignored, if it is NL, example: """ la-la-la """ will be equivalent of """la-la-la """ First variant looks better, isn't is? Second: all spaces after NL before first nonblack char but no more then current indentation are ignored, example: New: def f(): """ This is docstring, mama-mama, apple, banana """ is equivalent of old: def f(): """This is docstring, mama-mama, apple, banana """ Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I suggest you to apply patch but undefine PyPARSE_STRIPPED_STRINGS until python-4 ;-) I am sure, that this semantics is right, as alternative, I suggest adding new modifier 'i' to strings, like 'u' and 'r', for inst. i'iddqd'. P. S. AFAIU, editing of parsermodule.c needed. P. P. S. I am sorry, my English suck :-( ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 From noreply@sourceforge.net Mon Jul 8 15:47:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 07:47:46 -0700 Subject: [Patches] [ python-Patches-578667 ] Put IDE scripts in ~/Library Message-ID: Patches item #578667, was opened at 2002-07-08 15:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 Category: Macintosh Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Just van Rossum (jvr) Summary: Put IDE scripts in ~/Library Initial Comment: Just, here's a patch that was part of a larger set and this one was unrelated to the rest(unfortunately I've forgotten who sent it). The patch moves the IDE scripts folder to ~/Library when running on OSX. This is a good idea, because it allows people to have their own private set of IDE scripts, even if a sysadmin has installed Python. But: the patch as-is is probably not good enough, as there is no place for system-wide scripts anymore. (Scripts will also be shared between MacPython IDE and MachoPython IDE, which is also nice) You may want to look at providing two scripts folders, one in the normal location (i.e. somewhere in the Python tree) and one in ~/Library. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2002-07-08 16:47 Message: Logged In: YES user_id=92689 It was Tony Lownds. I'm all for the intentions of the patch, but I see it will fail on MacPython, which doesn't support os.environ["HOME"]. But I guess that statement could simply be replaced by the appropriate FindFolder() call. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470 From noreply@sourceforge.net Mon Jul 8 16:48:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 08:48:00 -0700 Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme Message-ID: Patches item #578688, was opened at 2002-07-08 16:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stepan Koltsov (yozh) Assigned to: Nobody/Anonymous (nobody) Summary: incompatible, but nice strings improveme Initial Comment: This patch changes interpretation of multiline strings (desn't matter, single, double quoted (when NL escaped with backslash), triple quoted). After applying this patch, first: first charachter after opening quote is ignored, if it is NL, example: """ la-la-la """ will be equivalent of """la-la-la """ First variant looks better, isn't is? Second: all spaces after NL before first nonblack char but no more then current indentation are ignored, example: New: def f(): """ This is docstring, mama-mama, apple, banana """ is equivalent of old: def f(): """This is docstring, mama-mama, apple, banana """ Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I suggest you to apply patch but undefine PyPARSE_STRIPPED_STRINGS until python-4 ;-) I am sure, that this semantics is right, as alternative, I suggest adding new modifier 'i' to strings, like 'u' and 'r', for inst. i'iddqd'. P. S. AFAIU, editing of parsermodule.c needed. P. P. S. I am sorry, my English suck :-( ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-08 17:48 Message: Logged In: YES user_id=21627 The first part of your patch is not needed, you can just as fine write """\ la-la-la """ to escape the first newline. The second patch is probably not needed either, since you can easily write library routines that deal with that kind of stripping. In fact, pydoc already does that transformation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 From noreply@sourceforge.net Mon Jul 8 17:06:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 09:06:39 -0700 Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme Message-ID: Patches item #578688, was opened at 2002-07-08 18:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stepan Koltsov (yozh) Assigned to: Nobody/Anonymous (nobody) Summary: incompatible, but nice strings improveme Initial Comment: This patch changes interpretation of multiline strings (desn't matter, single, double quoted (when NL escaped with backslash), triple quoted). After applying this patch, first: first charachter after opening quote is ignored, if it is NL, example: """ la-la-la """ will be equivalent of """la-la-la """ First variant looks better, isn't is? Second: all spaces after NL before first nonblack char but no more then current indentation are ignored, example: New: def f(): """ This is docstring, mama-mama, apple, banana """ is equivalent of old: def f(): """This is docstring, mama-mama, apple, banana """ Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I suggest you to apply patch but undefine PyPARSE_STRIPPED_STRINGS until python-4 ;-) I am sure, that this semantics is right, as alternative, I suggest adding new modifier 'i' to strings, like 'u' and 'r', for inst. i'iddqd'. P. S. AFAIU, editing of parsermodule.c needed. P. P. S. I am sorry, my English suck :-( ---------------------------------------------------------------------- >Comment By: Stepan Koltsov (yozh) Date: 2002-07-08 20:06 Message: Logged In: YES user_id=247706 I think the first part is still needed since 1. In r"""\ lalala """ backslash doesn't escape NL 2. I think it looks better. About second part: 1. Additional library routines make program text less readable. 2. They cannot know what indentation in spaces was where string constant appeared. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-08 19:48 Message: Logged In: YES user_id=21627 The first part of your patch is not needed, you can just as fine write """\ la-la-la """ to escape the first newline. The second patch is probably not needed either, since you can easily write library routines that deal with that kind of stripping. In fact, pydoc already does that transformation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 From noreply@sourceforge.net Tue Jul 9 02:45:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 18:45:28 -0700 Subject: [Patches] [ python-Patches-565183 ] email Parser non-strict mode Message-ID: Patches item #565183, was opened at 2002-06-06 03:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=565183&group_id=5470 Category: Modules Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Barry A. Warsaw (bwarsaw) Summary: email Parser non-strict mode Initial Comment: Here's my current state of the non-strict Parser mode. At the moment it handles most ugly stuff I see, with the exception of the multiple-nested-multiparts-with the same boundary tags grossness - but I think that this is actually a pretty savage violation of the RFC, so I'm not too fussed about it. There's still some work to be done in the area of digests, but I'll bring that up on mimelib-devel. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-07-08 21:45 Message: Logged In: YES user_id=12800 I've got this integrated with my copy now and will likely check it in. Any possibility you can send me some unit tests? ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2002-06-06 04:09 Message: Logged In: YES user_id=29957 Here's a newer version of the patch that gets digests right, as I talked about on mimelib-devel. The code that gets digests right should be split out of this in any case - I'd look into splitting it, but I've got too much on my plate right now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=565183&group_id=5470 From noreply@sourceforge.net Tue Jul 9 03:06:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 08 Jul 2002 19:06:59 -0700 Subject: [Patches] [ python-Patches-490456 ] Unicode support in email.Utils.encode Message-ID: Patches item #490456, was opened at 2001-12-07 18:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=490456&group_id=5470 Category: Library (Lib) Group: None >Status: Pending Resolution: None Priority: 5 Submitted By: Mikhail Zabaluev (mzabaluev) Assigned to: Barry A. Warsaw (bwarsaw) Summary: Unicode support in email.Utils.encode Initial Comment: It's essentially an updated patch 486375, this time making a distinction of type for the passed string; if it's Unicode, the function encodes it to the character set specified as the charset parameter. The reasons: 1. The function in its current version doesn't support Unicode, throwing an exception if any non-ASCII characters are found within it. 2. With this patch, we reach a sort of operational symmetry on email.Utils.encode vs email.Utils.decode, as it can be seen in the tests. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-07-08 22:06 Message: Logged In: YES user_id=12800 I'm changing the status to Pending since I think this patch is no longer relevant given that email.Utils.encode() is deprecated. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-06-28 23:37 Message: Logged In: YES user_id=12800 Sigh, sorry for taking so long to get to this. email.Utils.encode() is deprecated now, and I'd actually like to remove it rather than patch it. ;) Shouldn't the Header class be used instead? ---------------------------------------------------------------------- Comment By: Mikhail Zabaluev (mzabaluev) Date: 2001-12-11 17:52 Message: Logged In: YES user_id=313104 2loewis: In a typical email application, it'd be better to display partially encoded text than to face a hard stop when trying to process a message, hence 'replace'. Actually, the encoding mode could be an optional parameter, but I don't feel like deciding on parameters for a function not developed by me. Barry? The isinstance part seems to be valid, I'm updating the patch here accordingly. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2001-12-11 13:28 Message: Logged In: YES user_id=21627 The patch looks good, except that I cannot really see the value in using "replace" for .encode. Wouldn't it be better to get an exception if the Unicode string contains an un-encodable character? Also, the Python 2.2 way to spell the type test is if isinstance(s, unicode) This makes use of the fact that the unicode builtin is a type now, and it supports unicode subtypes. This is a minor change, of course. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=490456&group_id=5470 From noreply@sourceforge.net Tue Jul 9 09:13:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 01:13:26 -0700 Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme Message-ID: Patches item #578688, was opened at 2002-07-08 16:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stepan Koltsov (yozh) Assigned to: Nobody/Anonymous (nobody) Summary: incompatible, but nice strings improveme Initial Comment: This patch changes interpretation of multiline strings (desn't matter, single, double quoted (when NL escaped with backslash), triple quoted). After applying this patch, first: first charachter after opening quote is ignored, if it is NL, example: """ la-la-la """ will be equivalent of """la-la-la """ First variant looks better, isn't is? Second: all spaces after NL before first nonblack char but no more then current indentation are ignored, example: New: def f(): """ This is docstring, mama-mama, apple, banana """ is equivalent of old: def f(): """This is docstring, mama-mama, apple, banana """ Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I suggest you to apply patch but undefine PyPARSE_STRIPPED_STRINGS until python-4 ;-) I am sure, that this semantics is right, as alternative, I suggest adding new modifier 'i' to strings, like 'u' and 'r', for inst. i'iddqd'. P. S. AFAIU, editing of parsermodule.c needed. P. P. S. I am sorry, my English suck :-( ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-09 10:13 Message: Logged In: YES user_id=21627 In that case,I think your proposed change will be highly debated. That means you will have to write a PEP first if you want to see it implemented (even if it is only an option). ---------------------------------------------------------------------- Comment By: Stepan Koltsov (yozh) Date: 2002-07-08 18:06 Message: Logged In: YES user_id=247706 I think the first part is still needed since 1. In r"""\ lalala """ backslash doesn't escape NL 2. I think it looks better. About second part: 1. Additional library routines make program text less readable. 2. They cannot know what indentation in spaces was where string constant appeared. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-08 17:48 Message: Logged In: YES user_id=21627 The first part of your patch is not needed, you can just as fine write """\ la-la-la """ to escape the first newline. The second patch is probably not needed either, since you can easily write library routines that deal with that kind of stripping. In fact, pydoc already does that transformation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470 From noreply@sourceforge.net Tue Jul 9 23:43:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 15:43:53 -0700 Subject: [Patches] [ python-Patches-560379 ] Karatsuba multiplication Message-ID: Patches item #560379, was opened at 2002-05-24 21:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560379&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Christopher A. Craig (ccraig) Assigned to: Tim Peters (tim_one) Summary: Karatsuba multiplication Initial Comment: Adds Karatsuba multiplication to Python. Patches longobject.c to use Karatsuba multiplication in place of gradeschool math. ---------------------------------------------------------------------- >Comment By: Christopher A. Craig (ccraig) Date: 2002-07-09 18:43 Message: Logged In: YES user_id=135050 I've brought the code into compliance with the coding standards in the PEP7, and added some comments that I thought were in line with the rest of the file. If there is something else you would like me to do, please tell me. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-06-05 17:38 Message: Logged In: YES user_id=6380 Tim thinks this is cool, but the code can use cleanup and comments. Also, let's not add platform specific hacks (Christian can sell those as an add-on :-). ---------------------------------------------------------------------- Comment By: Christopher A. Craig (ccraig) Date: 2002-05-25 19:41 Message: Logged In: YES user_id=135050 I made the needed changes to make to split on the bigger number (basically chaged to split on bigger number, and changed all of the places that need to check to see if there are no bits left), and the new one is a little bit faster, so I'm uploading it too. I had been thinking about fixed precision numbers when I wrote it, so I honestly didn't consider the fact that I could just shift the smaller number to 0 and throw it away... :-) ---------------------------------------------------------------------- Comment By: Christopher A. Craig (ccraig) Date: 2002-05-25 12:16 Message: Logged In: YES user_id=135050 I just uploaded a graph with some sample timings in it. Red is a fence of 20. Green is a fence of 40. Blue is a fence of 60. Black is done with unmodified Python 2.2.1. ---------------------------------------------------------------------- Comment By: Christopher A. Craig (ccraig) Date: 2002-05-25 01:53 Message: Logged In: YES user_id=135050 I got 40 from testing. Basically I generated 250 random numbers each for a series of sizes between 5 and 2990 bits long at 15 bit intervals (i.e. the word size), and stored it in a dictionary. Then timed 249 multiplies at each size for a bunch of fence values and used gdchart to make a pretty graph. It cerntainly could be optimized better per compiler/platform, but I don't know how much gain you'ld see. I split on the smaller number because I guessed it would be better. My thought was that if I split on the smaller number I'm guaranteed to reach the fence, at which point I can use the gradeschool method at a near linear cost (since it's O(n*m) and one of those two is at most the fence size). If I split on the larger number, I may run into a condition where the smaller number is less than half the larger, but I haven't reached the fence yet, and then gradeschool could be much more expensive. ---------------------------------------------------------------------- Comment By: Christian Tismer (tismer) Date: 2002-05-24 23:23 Message: Logged In: YES user_id=105700 Hmm, not bad. Q: You set the split fence at 40. Where does this number come from? I think this could be optimzed per compiler/platform. You say that you split based on the smaller number. Why this? My intuitive guess would certainly be to always split on the larger number. I just checked my Python implementation which does this. Open question: how to handle very small by very long the best way? Probably the highschool version is better here, and that might have led you to investigate the smaller one. I'd say bosh should be checked. good work! - cheers chris ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560379&group_id=5470 From noreply@sourceforge.net Wed Jul 10 00:44:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 16:44:16 -0700 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 12:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 18:44 Message: Logged In: YES user_id=44345 Closing since there seems to be no votes in favor, at least not by bots... S ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 20:25 Message: Logged In: YES user_id=31435 hasattr() is defined in terms of whether PyObject_GetAttr() raises an exception, and thanks to __getattr__ hooks can't be computed any faster than calling PyObject_GetAttr(). Which is what the code does: v = PyObject_GetAttr(v, name); if (v == NULL) { PyErr_Clear(); Py_INCREF(Py_False); return Py_False; } Py_DECREF(v); Py_INCREF(Py_True); return Py_True; It's simply not going to get faster than that. I'm not saying you can't have a "better" message here (although since an object's __name__ field doesn't bear any necessary relationship to the variable name(s) through which the object is referenced, it's unclear that the message won't actually be worse in real non-trivial cases: the type name is an object invariant, but the name can be misleading). I am saying the tradeoff is real and needs to be addressed. That's part of "good design", Dale; doing what feels good in the last case you remember is arguably not. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 19:50 Message: Logged In: YES user_id=44345 In theory. Python's getattr capability is so dynamic though I suspect there's little hasattr() can do but call getattr() and react to the result. ---------------------------------------------------------------------- Comment By: Dale Strickland-Clark (dalesc) Date: 2002-03-20 18:36 Message: Logged In: YES user_id=457577 Surely Tim's is more an argument for fixing hasattr so it doesn't depend on an exception? To limit meaningful error messages because they slow normal program flow screams 'bad design' to me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 17:09 Message: Logged In: YES user_id=31435 If it's one cycle slower than it is today when the exception is ignored, Zope will notice it (it uses hasattr for blood). Then Guido will get fired, have to pump gas in Amsterdam for a living, and we'll never hear from him again. How badly do you want to destroy Python ? It may be fruitful to hammer out an efficient alternative on PythonDev. It's not an argument about whether more info would be useful, although on c.l.py Dale seemed happy enough as soon as someone explained what 'module' was doing in his msg. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 15:50 Message: Logged In: YES user_id=44345 hmmm... How much would I have to modify it to get you to change your mind? I'm pretty sure I can get rid of the call to PyObject_HasAttrString without a lot of effort. I can't do much about avoiding at least one PyObject_GetAttrString call though, which obviously means you could wind up back in bytecode. I jumped on this after seeing the request in c.l.py mostly because I've wanted it from time-to-time as well. The extra information is useful at times. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Wed Jul 10 00:45:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 16:45:15 -0700 Subject: [Patches] [ python-Patches-506436 ] GETCONST/GETNAME/GETNAMEV speedup Message-ID: Patches item #506436, was opened at 2002-01-21 07:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Tim Peters (tim_one) Summary: GETCONST/GETNAME/GETNAMEV speedup Initial Comment: The attached patch redefines the GETCONST, GETNAME & GETNAMEV macros to do the following: * access the code object's consts and names through local variables instead of the long chain from f * use access macros to index the tuples and get the C string names The code appears correct, and I've had no trouble with it. It only provides the most trivial of improvement on pystone (around 1% when I see anything), but it's all those little things that add up, right? Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 18:45 Message: Logged In: YES user_id=44345 Looking for a vote up or down on this one... ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-01-21 07:47 Message: Logged In: YES user_id=44345 Whoops... Make the "observed" speedup 0.1%... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470 From noreply@sourceforge.net Wed Jul 10 00:46:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 16:46:25 -0700 Subject: [Patches] [ python-Patches-534862 ] help asyncore recover from repr() probs Message-ID: Patches item #534862, was opened at 2002-03-25 15:12 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Jeremy Hylton (jhylton) Summary: help asyncore recover from repr() probs Initial Comment: I've had this patch my my copy of asyncore.py for quite awhile. It works for me as a way to recover from repr() bogosities, though I'm unfamiliar enough with repr/str issues and asyncore to know if this is the right way to make it more bulletproof (or if it should even be made more bulletproof). Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 18:46 Message: Logged In: YES user_id=44345 Looking for a vote up or down so I can get rid of the "M" when I execute "cvs up"... S ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-04-04 11:12 Message: Logged In: YES user_id=6380 Jeremy, what do you think of this? Looks harmless to me... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470 From noreply@sourceforge.net Wed Jul 10 00:48:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 16:48:01 -0700 Subject: [Patches] [ python-Patches-569574 ] plain text enhancement for cgitb Message-ID: Patches item #569574, was opened at 2002-06-15 23:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) >Assigned to: Ka-Ping Yee (ping) Summary: plain text enhancement for cgitb Initial Comment: Here's a patch to cgitb that allows you to enable plain text output. It adds an extra variable to the cgitb.enable function and corresponding underlying functions. To get plain text invoke it as import cgitb cgitb.enable(format="text") (actually, any value for format other than "html" will enable plain text output). The default value is "html", so existing usage of cgitb should be unaffected. I realize this isn't quite what you suggested, but it seemed to me worthwhile to keep such similar code together. I'm not entirely certain I haven't fouled up the html formatting. It needs to be checked still. Also still to come is a doc change. Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 18:48 Message: Logged In: YES user_id=44345 Ping How about you? As the author I think you're in the best position to decide on the merits of the patch... Skip ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-06-19 22:36 Message: Logged In: YES user_id=6380 Unassigning -- I won't get to this before my vacation. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-16 00:09 Message: Logged In: YES user_id=44345 Okay, here's a correction to the first patch. It fixes the logic bug that corrupted the HTML output. It also adds a little bit of extra documentation. Writing the documentation made me think that perhaps this should be added to the traceback module as Guido suggested with just a stub cgitb module that provides an enable function that calls the enable function in the traceback module with format="html". The cgitb module could then be deprecated. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470 From noreply@sourceforge.net Wed Jul 10 03:22:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 19:22:51 -0700 Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation Message-ID: Patches item #578494, was opened at 2002-07-08 10:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Pending Resolution: None Priority: 5 Submitted By: Vinay Sajip (vsajip) Assigned to: Mark Hammond (mhammond) Summary: PEP 282 Implementation Initial Comment: The attached file implements PEP282. The file logging- 0.4.6.tar.gz is the entire distribution including setup/install, test/example scripts, and TeX documentation. The file logging.py (within the .tar.gz) is all that is needed to implement the PEP. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-10 12:22 Message: Logged In: YES user_id=14198 The code seems high quality and well documented. I have no concerns with logging.py as such. I have two main issues: * Design decisions: looking over python-dev, I can not see a consensus on the design decisions. I believe that *some* type of official acceptance of the design should be decreed by someone. * Source structure: while this seems quite suitable for an extension module, the format of the patch is probably not quite correct for a core module. For example, the test code should probably be integrated with the standard Python test suite (even if in a sub-directory), the Tex docs integrated with Python's docs etc So while I think the patch is high quality I believe these issues need to be addressed before I can do much more. Setting to "pending" - but good stuff tho! Please drive this through! ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2002-07-08 10:56 Message: Logged In: YES user_id=308438 Added just the logging.py file to make it easier to review. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470 From noreply@sourceforge.net Wed Jul 10 04:10:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 20:10:17 -0700 Subject: [Patches] [ python-Patches-579433 ] Solaris openpty() and forkpty() addition Message-ID: Patches item #579433, was opened at 2002-07-09 22:10 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579433&group_id=5470 Category: Modules Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Lance Ellinghaus (ellinghaus) Assigned to: Nobody/Anonymous (nobody) Summary: Solaris openpty() and forkpty() addition Initial Comment: This patch provides a Solaris 2.8 version of openpty() and forkpty() since they are not provided for in the distribution of Solaris. This has only been tested on Solaris 2.8. This was posed to Python-DEV and I was told to post it here, so I am. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579433&group_id=5470 From noreply@sourceforge.net Wed Jul 10 04:13:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 20:13:17 -0700 Subject: [Patches] [ python-Patches-579435 ] Shadow Password Support Module Message-ID: Patches item #579435, was opened at 2002-07-09 22:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579435&group_id=5470 Category: Modules Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Lance Ellinghaus (ellinghaus) Assigned to: Nobody/Anonymous (nobody) Summary: Shadow Password Support Module Initial Comment: Attached is the spwd module. This module provides support for Shadow Passwords on Solaris 2.8. This compliments the nis and pwd modules. This is the only way to gain access to the encrypted passwords when using shadow passwords on Solaris. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579435&group_id=5470 From noreply@sourceforge.net Wed Jul 10 04:33:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 09 Jul 2002 20:33:11 -0700 Subject: [Patches] [ python-Patches-569574 ] plain text enhancement for cgitb Message-ID: Patches item #569574, was opened at 2002-06-15 21:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Ka-Ping Yee (ping) Summary: plain text enhancement for cgitb Initial Comment: Here's a patch to cgitb that allows you to enable plain text output. It adds an extra variable to the cgitb.enable function and corresponding underlying functions. To get plain text invoke it as import cgitb cgitb.enable(format="text") (actually, any value for format other than "html" will enable plain text output). The default value is "html", so existing usage of cgitb should be unaffected. I realize this isn't quite what you suggested, but it seemed to me worthwhile to keep such similar code together. I'm not entirely certain I haven't fouled up the html formatting. It needs to be checked still. Also still to come is a doc change. Skip ---------------------------------------------------------------------- >Comment By: Ka-Ping Yee (ping) Date: 2002-07-09 20:33 Message: Logged In: YES user_id=45338 I think enhanced text tracebacks would be great. (I even have my own hacked-up one lying around here somewhere -- it colourized the output. I think a part of me was waiting for an opportunity to make enhanced tracebacks standard. The most important enhancement IMHO is to show argument values.) I don't think the functionality belongs in cgitb, though. The main routine probably should go in traceback; the common routines (scanvars and lookup) can go there too. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 16:48 Message: Logged In: YES user_id=44345 Ping How about you? As the author I think you're in the best position to decide on the merits of the patch... Skip ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-06-19 20:36 Message: Logged In: YES user_id=6380 Unassigning -- I won't get to this before my vacation. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-15 22:09 Message: Logged In: YES user_id=44345 Okay, here's a correction to the first patch. It fixes the logic bug that corrupted the HTML output. It also adds a little bit of extra documentation. Writing the documentation made me think that perhaps this should be added to the traceback module as Guido suggested with just a stub cgitb module that provides an enable function that calls the enable function in the traceback module with format="html". The cgitb module could then be deprecated. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470 From noreply@sourceforge.net Wed Jul 10 21:51:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 10 Jul 2002 13:51:14 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 16:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Skip Montanaro (montanaro) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-07-10 13:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-26 21:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 18:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-20 21:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 14:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 12:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 13:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 17:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 23:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 16:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 15:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 22:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-03 22:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 21:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 00:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 06:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 14:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 15:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 14:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Wed Jul 10 22:09:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 10 Jul 2002 14:09:39 -0700 Subject: [Patches] [ python-Patches-579841 ] Build MachoPython with 2level namespace Message-ID: Patches item #579841, was opened at 2002-07-10 23:09 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579841&group_id=5470 Category: Macintosh Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Jack Jansen (jackjansen) Summary: Build MachoPython with 2level namespace Initial Comment: This patch builds a framework-based Python on OSX without --flat_namespace. In addition the Makefile.pre.in logic for building the temporary framework is slightly reordered to make it more error-proof. The main reason for putting this patch up here is that it was supposed to disallow importing extension modules for a framework-python to be imported into a non-framework-python. But unfortunately it does this this with a coredump in stead of with the expected "Python not initialized (wrong version?)" error message. I would like feedback as to why this is (as other people do get the error message in similar situations). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579841&group_id=5470 From noreply@sourceforge.net Thu Jul 11 22:45:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 11 Jul 2002 14:45:57 -0700 Subject: [Patches] [ python-Patches-580331 ] xreadlines caching, file iterator Message-ID: Patches item #580331, was opened at 2002-07-11 21:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: xreadlines caching, file iterator Initial Comment: Calling f.xreadlines() multiple times returns the same xreadlines object. A file is an iterator - __iter__() returns self and next() calls the cached xreadlines object's next method. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470 From noreply@sourceforge.net Fri Jul 12 03:11:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 11 Jul 2002 19:11:20 -0700 Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre Message-ID: Patches item #580386, was opened at 2002-07-11 22:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Fredrik Lundh (effbot) Summary: uncaught TypeError exception in sre Initial Comment: >From c.l.p on 9 July, Kevin Altis reported that: re.compile('([a-') Produces an uncaught TypeError from compilation. This patch catches the TypeError in _compile(). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470 From noreply@sourceforge.net Fri Jul 12 04:05:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 11 Jul 2002 20:05:41 -0700 Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre Message-ID: Patches item #580386, was opened at 2002-07-11 22:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Fredrik Lundh (effbot) Summary: uncaught TypeError exception in sre Initial Comment: >From c.l.p on 9 July, Kevin Altis reported that: re.compile('([a-') Produces an uncaught TypeError from compilation. This patch catches the TypeError in _compile(). ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-11 23:05 Message: Logged In: YES user_id=33168 I wonder if the same change must be made in _compile_repl(). I don't see the benefit of the try/except clause as it is: try: p = parse... except error, v: raise error, v Isn't that just: p = parse... This probably also should be backported to 2.2 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470 From noreply@sourceforge.net Fri Jul 12 05:04:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 11 Jul 2002 21:04:01 -0700 Subject: [Patches] [ python-Patches-580411 ] move frame macros into ceval Message-ID: Patches item #580411, was opened at 2002-07-12 00:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: move frame macros into ceval Initial Comment: There are some old macros in frameobject.h which are only used in ceval.c. These macros are not prefixed with Py and some aren't used at all. This patch: * removes all of the macros from frameobject.h * moves the necessary macros into ceval.c * gets rid of an extra level of macros * uses co alias instead of f->f_code ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470 From noreply@sourceforge.net Fri Jul 12 05:16:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 11 Jul 2002 21:16:56 -0700 Subject: [Patches] [ python-Patches-580411 ] move frame macros into ceval Message-ID: Patches item #580411, was opened at 2002-07-12 00:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: move frame macros into ceval Initial Comment: There are some old macros in frameobject.h which are only used in ceval.c. These macros are not prefixed with Py and some aren't used at all. This patch: * removes all of the macros from frameobject.h * moves the necessary macros into ceval.c * gets rid of an extra level of macros * uses co alias instead of f->f_code ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-12 00:16 Message: Logged In: YES user_id=31435 Nice! Accepted and back to Neal. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470 From noreply@sourceforge.net Fri Jul 12 12:07:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 12 Jul 2002 04:07:58 -0700 Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre Message-ID: Patches item #580386, was opened at 2002-07-12 04:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open >Resolution: Duplicate Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Fredrik Lundh (effbot) Summary: uncaught TypeError exception in sre Initial Comment: >From c.l.p on 9 July, Kevin Altis reported that: re.compile('([a-') Produces an uncaught TypeError from compilation. This patch catches the TypeError in _compile(). ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-07-12 13:07 Message: Logged In: YES user_id=38376 this is same as bug #545855, and should be fixed inside the SRE parser (afaik, it has been, in the SLAB master repository). as for the extra try/except: this is to shield ordinary users from 20-level tracebacks exposing irrelevant implementation details . if you make a mistake in an RE, you want to know that, but you probably don't care about exactly where in the parser or compiler internals the interpreter happens to be when that mistake was discovered... (this pattern, along with the "add a comment on the raise line, to provide extra hints for a human reader" idiom, are pretty common in Python libraries). /F ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-12 05:05 Message: Logged In: YES user_id=33168 I wonder if the same change must be made in _compile_repl(). I don't see the benefit of the try/except clause as it is: try: p = parse... except error, v: raise error, v Isn't that just: p = parse... This probably also should be backported to 2.2 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470 From noreply@sourceforge.net Fri Jul 12 12:11:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 12 Jul 2002 04:11:52 -0700 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 14:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 >Category: Modules Group: None Status: Open Resolution: Accepted >Priority: 8 Submitted By: Greg Chapman (glchapman) Assigned to: Fredrik Lundh (effbot) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-07-12 13:11 Message: Logged In: YES user_id=38376 (bumped priority as a reminder to self) /F ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-08 19:28 Message: Logged In: YES user_id=31435 Assigned to /F -- he's the expert here. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 16:23 Message: Logged In: YES user_id=86307 I'm pretty sure the memset is correct; state->lastmark is the index of last mark written to (not the index of the next potential write). Also, it occurred to me that there is another related error here: >>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123') >>> m.groups() (None, None) >>> m.lastindex 2 In other words, lastindex claims that group 2 was the last that matched, even though it didn't really match. Since lastindex is undocumented, this probably doesn't matter too much. Still, it probably should be reset if it is pointing to a group which gets "unmatched" when state->lastmark is reduced. Perhaps a function like the following should be added for use in the three places where state->lastmark is reset to a previous value: void lastmark_restore(SRE_STATE *state, int lastmark) { assert(lastmark >= 0); if (state->lastmark > lastmark) { int lastvalidindex = (lastmark == 0) ? -1 : (lastmark-1)/2+1; if (state->lastindex > lastvalidindex) state->lastindex = lastvalidindex; memset( state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*) ); } state->lastmark = lastmark; } ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-08 14:29 Message: Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 14:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Fri Jul 12 16:46:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 12 Jul 2002 08:46:44 -0700 Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection Message-ID: Patches item #515003, was opened at 2002-02-08 16:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) >Assigned to: Jeremy Hylton (jhylton) Summary: Added HTTP{,S}ProxyConnection Initial Comment: This patch adds HTTP*Connection classes for proxy connections. Authenticated proxies are also supported. One can argue urllib2 already implements this. It does not do HTTPS tunneling through proxies, and this is intended to be lower-level than urllib2. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 11:46 Message: Logged In: YES user_id=6380 Assigning to Jeremy in the hope that he can provide a review. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-06-23 23:03 Message: Logged In: YES user_id=205865 The newer patch is generated against the latest CVS tree, and it provides additional documentation. ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-06-11 14:47 Message: Logged In: YES user_id=205865 Sorry, been caught with a zillion of other things to do. I'll try to reorganize it somehow and ask for opinions. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-06-11 14:42 Message: Logged In: YES user_id=31392 misa-- any progress on this patch? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 18:12 Message: Logged In: YES user_id=6380 OK, thanks; I'll wait! ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-03-01 17:58 Message: Logged In: YES user_id=205865 I will add documentation and show the intended usage. urllib* doesn't deal with proxying over SSL (using CONNECT instead of GET/POST). urllib* also use the compatibility classes, HTTP/HTTPS, instead of HTTPConnection (this is not an argument by itself). Thanks for the suggestion. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:40 Message: Logged In: YES user_id=6380 This patch fails to seduce me. There's no explanation why this would be useful, or how it should be used, and no documentation, and a hint that urllib2 already does this. Maybe you can get someone who's known on python-dev to champion it, if you think it's useful? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 From noreply@sourceforge.net Fri Jul 12 17:58:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 12 Jul 2002 09:58:32 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 19:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) >Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 12:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 16:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-27 00:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 21:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-21 00:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 17:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 15:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 16:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 20:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-05 02:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 19:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 18:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 01:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 01:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 00:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 03:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 09:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 17:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 18:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 18:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 17:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Fri Jul 12 18:08:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 12 Jul 2002 10:08:02 -0700 Subject: [Patches] [ python-Patches-575515 ] Merge xrange() into slice() Message-ID: Patches item #575515, was opened at 2002-06-29 18:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575515&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Raymond Hettinger (rhettinger) Assigned to: Guido van Rossum (gvanrossum) Summary: Merge xrange() into slice() Initial Comment: Xrange() and Slice() have evolved to be very similar. Merging the code for xrange() into slice() will complete the transformation, put all the capability into one object, eliminate an object type, eliminate two source files, and shrink the Python concept space by a modest amount. Discussion on py-dev (see thread Xrange and Slices starting on 6/26/2002) was generally favorable. All of the design suggestions received have been incorporated in this patch. Slice is left intact as a mutable container of arbitrary Python objects. It's sq_item, sq_len, and tp_iter slots are filled in to give it the same sequence behavior as xrange(). The tp_iter slot creates an immutable iterator based on the state of the slice at the time the iterator is created. The iterator uses c longs instead of PyObjects to protect its immutability and to keep the super fast speed that it had in xrange(). To keep the old xrange iterface intact, 'xrange' is made synonymous with 'slice'. Also, slice.h is given macros and a PyRange_New() wrapper so that the xrange C API is left intact. Two minor open issues: 1. Should repr() say 'slice' or 'xrange'? 2. What should the return value be for slice_length() when step is zero or None? Patch passes all regression tests. A news item should be added eventhough the APIs are unchanged. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 13:08 Message: Logged In: YES user_id=6380 Rejecting. It's better to let these two be different, so that it's clear what the intended use is. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2002-06-30 15:35 Message: Logged In: YES user_id=80475 New patch attached. Incorporates three ideas from Oren Tirosh's code review (int-->long, xrange as public interface, return -1 on len error). I'm away from the computer for the next five weeks. Oren has agreed to champion my patches (not necessarily advocate, just make sure they get a fair trial and that requested changes get made). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575515&group_id=5470 From noreply@sourceforge.net Fri Jul 12 18:21:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 12 Jul 2002 10:21:17 -0700 Subject: [Patches] [ python-Patches-580670 ] less restrictive HTML comments Message-ID: Patches item #580670, was opened at 2002-07-12 13:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Bill Bell (wbell539) Assigned to: Nobody/Anonymous (nobody) Summary: less restrictive HTML comments Initial Comment: Current code enforces requirement that HTML comments open with ' not in 2.2.x ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-07 16:41 Message: Logged In: YES user_id=475133 Patched and more robust version of the extended files CallTips.py and CallTipWindows.py. (Now more compatible to earlier versions of python) ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-03 17:02 Message: Logged In: YES user_id=475133 '' must be substituted by '.' within CallTip.py ! ( Linux do not support an event named ) Running idle on Linux, I found the warning, that 'import *' is not allowed within function '_dir_main' of CallTip.py ??? Nevertheless CallTips works fine on Linux ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 From noreply@sourceforge.net Thu Jul 18 16:29:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 08:29:38 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 19:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 11:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 17:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 18:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 17:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 17:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 12:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 16:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-27 00:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 21:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-21 00:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 17:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 15:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 16:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 20:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-05 02:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 19:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 18:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 01:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 01:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 00:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 03:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 09:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 17:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 18:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 18:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 17:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Thu Jul 18 04:29:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 17 Jul 2002 20:29:45 -0700 Subject: [Patches] [ python-Patches-583188 ] expose Parser.strict flag to functions Message-ID: Patches item #583188, was opened at 2002-07-18 13:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Barry A. Warsaw (bwarsaw) Summary: expose Parser.strict flag to functions Initial Comment: The following trivial patch exposes the 'strict' flag in the email.message_from_file and email.message_from_string functions. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470 From noreply@sourceforge.net Thu Jul 18 03:34:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 17 Jul 2002 19:34:36 -0700 Subject: [Patches] [ python-Patches-583180 ] smtplib.py patch for macmail esmtp auth Message-ID: Patches item #583180, was opened at 2002-07-18 02:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: bob kuehne (mysticbob) Assigned to: Nobody/Anonymous (nobody) Summary: smtplib.py patch for macmail esmtp auth Initial Comment: i ran into a problem that i've seen several other people describe where they can't authenticate to their particular mail server. i dug into this (my mail server is smtp.mac.com) and discovered that smtplib.py didn't support the specific type of auth that this server required. so, this patch,allows authentication to these specific server types. i also reworked one token to make it a bit more modular. the patch is attached, generated of form: diff smtplib.py_orig smtplib.py_new i'm new to python, and new to the whole patch process on sourceforge, so please let me know what i can do to test, or how else i can work to get this in the next python version. thank you! bob ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470 From noreply@sourceforge.net Thu Jul 18 04:37:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 17 Jul 2002 20:37:30 -0700 Subject: [Patches] [ python-Patches-583190 ] Patch to make email parser more robust Message-ID: Patches item #583190, was opened at 2002-07-18 13:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470 >Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: Patch to make email parser more robust Initial Comment: the following patch against current CVS of the email package, as of 2002/07/18 fixes the following problems: in non-strict mode, messages don't require a blank line at the end with a missing end-terminator. A single newline is sufficient now. The remaining fixes apply in strict or non-strict mode: Handle trailing whitespace at the end of a boundary. Had to switch from using string.split() to re.split(). Handle whitespace on the end of a parameter list for Content-type. Handle whitespace on the end of a plain content-type header. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470 From noreply@sourceforge.net Thu Jul 18 07:50:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 17 Jul 2002 23:50:34 -0700 Subject: [Patches] [ python-Patches-583235 ] make file object an iterator Message-ID: Patches item #583235, was opened at 2002-07-18 08:50 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583235&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: make file object an iterator Initial Comment: As per python-dev discussion july 17 2002 & earlier, I reworked Oren's patch to remove a reference loop between file object and xreadlines object (making the reference xreadl.->fileob non-addref'd when and only when the xreadlines object is being internally held by the fileob), make f.readline interop with f.next (the former delegating to the latter iff f is holding an xreadl. obj), make f.seek remove the xreadl.obj that f is holding (if any), and removing the optimization of caching xreadlines function pointers as static variables in functions of fileobject.c. Also added tests for this functionality to test_file.py. Alex ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583235&group_id=5470 From noreply@sourceforge.net Thu Jul 18 04:34:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 17 Jul 2002 20:34:08 -0700 Subject: [Patches] [ python-Patches-583188 ] expose Parser.strict flag to functions Message-ID: Patches item #583188, was opened at 2002-07-18 13:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470 >Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Barry A. Warsaw (bwarsaw) Summary: expose Parser.strict flag to functions Initial Comment: The following trivial patch exposes the 'strict' flag in the email.message_from_file and email.message_from_string functions. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470 From noreply@sourceforge.net Thu Jul 18 04:36:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 17 Jul 2002 20:36:43 -0700 Subject: [Patches] [ python-Patches-583190 ] Patch to make email parser more robust Message-ID: Patches item #583190, was opened at 2002-07-18 13:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Nobody/Anonymous (nobody) Summary: Patch to make email parser more robust Initial Comment: the following patch against current CVS of the email package, as of 2002/07/18 fixes the following problems: in non-strict mode, messages don't require a blank line at the end with a missing end-terminator. A single newline is sufficient now. The remaining fixes apply in strict or non-strict mode: Handle trailing whitespace at the end of a boundary. Had to switch from using string.split() to re.split(). Handle whitespace on the end of a parameter list for Content-type. Handle whitespace on the end of a plain content-type header. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470 From noreply@sourceforge.net Thu Jul 18 20:41:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 12:41:37 -0700 Subject: [Patches] [ python-Patches-552438 ] PyBufferObject fixes Message-ID: Patches item #552438, was opened at 2002-05-05 00:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470 Category: Core (C code) Group: None Status: Open >Resolution: Out of Date Priority: 5 Submitted By: Scott Gilbert (xscott) >Assigned to: Tim Peters (tim_one) Summary: PyBufferObject fixes Initial Comment: This patch fixes these problems: 1) Dangling pointer problem 2) buffer allocated by PyBuffer_New not aligned The PyBufferObject acts differently depending on whether it allocated the memory or if it's borrowing the memory from a PyBufferProcs supporting object. In the case of allocating it's own memory, I made a slight addition that adds some padding so that the ptr is on a sizeof(double) boundary. In the case of borrowing another objects PyBufferProcs memory, PyBufferObject no longer caches the pointer. This might slow things down (probably not by much), but it keeps PyBufferObject from working with a stale pointer. Normally I wouldn't do this, but since this patch touches pretty much every function anyway, I fixed many deviations from the Python coding style. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 15:41 Message: Logged In: YES user_id=6380 Note, the patch is out of date since somebody fixed some nits with slicing, so I'm marking this as Out Of Date. You might as well upload the new version of the file. :-) Why do you think you need to fix the allocation? Since allocation is done via malloc(), and malloc() guarantees allocation for a double ("for all types"), shouldn't that be enough??? (If it's obmalloc that you're worried about, it's easy to force this to use the real malloc() and free().) I hope Tim will make some time to review this (the "not this week" comment is several months old now). Superficially it looks like a big improvement. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-05-07 14:51 Message: Logged In: YES user_id=31435 Na, assigning a bug is fine by me -- it helps to have *someone* feel guilty . Assigning it doesn't mean it goes to the top of the assignee's heap, though. I can't make time to look at it this week, so it's just as well that it got unassigned. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-07 08:55 Message: Logged In: YES user_id=38318 Apparently assigning a patch is poor form. My bad. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-05 00:27 Message: Logged In: YES user_id=38318 Can I assign this to you or does it take admin privs? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470 From noreply@sourceforge.net Thu Jul 18 21:27:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 13:27:45 -0700 Subject: [Patches] [ python-Patches-580995 ] new version of Set class Message-ID: Patches item #580995, was opened at 2002-07-13 17:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580995&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Alex Martelli (aleax) Assigned to: Nobody/Anonymous (nobody) Summary: new version of Set class Initial Comment: As per python-dev discussion on Sat 13 July 2002, subject "Dict constructor". A version of Greg Wilson's sandbox Set class that avoids the trickiness of implicitly freezing a set when __hash__ is called on it. Rather, uses several classes: Set itself has no __hash__ and represents a general, mutable set; BaseSet, its superclass, has all functionality common to mutable and immutable sets; ImmutableSet also subclasses BaseSet and adds __hash__; a wrapper _TemporarilyImmutableSet wraps a Set exposing only __hash__ (identical to that an ImmutableSet built from the Set would have) and __eq__ and __ne__ (delegated to the Set instance). Set.add(self, x) attempts to call x=x._asImmutable() (if AttributeError leaves x alone); Set._asImmutable(self) returns ImmutableSet(self). Membership test BaseSet.__contains__(self, x) attempt to call x = x._asTemporarilyImmutable() (if AttributeError leaves x alone); Set._asTemporarilyImmutable(self) returns TemporarilyImmutableSet(self). I've left Greg's code mostly alone otherwise except for fixing bugs/obsolescent usage (e.g. dictionary rather than dict) and making what were ValueError into TypeError (ValueError was doubtful earlier, is untenable now that mutable and immutable sets are different types). The change in exceptions forced me to change the unit tests in test_set.py, too, but I made no other changes nor additions. ---------------------------------------------------------------------- >Comment By: Alex Martelli (aleax) Date: 2002-07-18 22:27 Message: Logged In: YES user_id=60314 Changed as per GvR comments so now sets have-a dict rather than being-a dict. Made code more direct in some places (using list comprehensions rather than loops where appropriate). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580995&group_id=5470 From noreply@sourceforge.net Thu Jul 18 22:29:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 14:29:31 -0700 Subject: [Patches] [ python-Patches-583188 ] expose Parser.strict flag to functions Message-ID: Patches item #583188, was opened at 2002-07-17 23:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Barry A. Warsaw (bwarsaw) Summary: expose Parser.strict flag to functions Initial Comment: The following trivial patch exposes the 'strict' flag in the email.message_from_file and email.message_from_string functions. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-07-18 17:29 Message: Logged In: YES user_id=12800 Accepted and applied. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470 From noreply@sourceforge.net Thu Jul 18 22:39:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 14:39:12 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 16:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-07-18 14:39 Message: Logged In: YES user_id=357491 God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files. I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export. I removed the __future__ import. And thanks for the piece of advice; I was taking the advice that __future__ statements should come before code a little too far. =) As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files. That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 08:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 14:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 15:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 14:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 14:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 09:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 13:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-26 21:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 18:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-20 21:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 14:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 12:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 13:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 17:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 23:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 16:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 15:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 22:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-03 22:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 21:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 00:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 06:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 14:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 15:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 14:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Thu Jul 18 22:47:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 14:47:18 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 19:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 17:47 Message: Logged In: YES user_id=6380 OK, deleting all old files as promised. All tests succeed. I think I'll check this version in (but it may be tomorrow, since I've got a few other things to take care of). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 17:39 Message: Logged In: YES user_id=357491 God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files. I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export. I removed the __future__ import. And thanks for the piece of advice; I was taking the advice that __future__ statements should come before code a little too far. =) As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files. That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 11:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 17:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 18:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 17:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 17:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 12:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 16:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-27 00:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 21:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-21 00:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 17:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 15:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 16:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 20:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-05 02:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 19:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 18:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 01:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 01:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 00:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 03:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 09:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 17:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 18:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 18:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 17:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Thu Jul 18 23:34:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 15:34:16 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 16:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-07-18 15:34 Message: Logged In: YES user_id=357491 Wonderful! About the docs; do you want me to email Fred or upload a patched version of the docs for time fixed? And for removing the request in PEP 42, should I email Jeremy about it or Barry? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 14:47 Message: Logged In: YES user_id=6380 OK, deleting all old files as promised. All tests succeed. I think I'll check this version in (but it may be tomorrow, since I've got a few other things to take care of). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 14:39 Message: Logged In: YES user_id=357491 God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files. I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export. I removed the __future__ import. And thanks for the piece of advice; I was taking the advice that __future__ statements should come before code a little too far. =) As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files. That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 08:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 14:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 15:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 14:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 14:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 09:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 13:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-26 21:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 18:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-20 21:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 14:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 12:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 13:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 17:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 23:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 16:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 15:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 22:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-03 22:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 21:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 00:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 06:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 14:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 15:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 14:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Fri Jul 19 00:09:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 16:09:12 -0700 Subject: [Patches] [ python-Patches-583190 ] Patch to make email parser more robust Message-ID: Patches item #583190, was opened at 2002-07-17 23:36 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Barry A. Warsaw (bwarsaw) Summary: Patch to make email parser more robust Initial Comment: the following patch against current CVS of the email package, as of 2002/07/18 fixes the following problems: in non-strict mode, messages don't require a blank line at the end with a missing end-terminator. A single newline is sufficient now. The remaining fixes apply in strict or non-strict mode: Handle trailing whitespace at the end of a boundary. Had to switch from using string.split() to re.split(). Handle whitespace on the end of a parameter list for Content-type. Handle whitespace on the end of a plain content-type header. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-07-18 19:09 Message: Logged In: YES user_id=12800 I made a few stylistic mods to the Parser.py patch, but otherwise the patch looks fine. Please double check that I didn't mess anything up! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470 From noreply@sourceforge.net Fri Jul 19 00:35:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 16:35:46 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 16:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-07-18 16:35 Message: Logged In: YES user_id=357491 Since I had the time, I went ahead and did a patch for libtime.tex that removes the comment saying that strptime fully relies on the C library and uploaded it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 15:34 Message: Logged In: YES user_id=357491 Wonderful! About the docs; do you want me to email Fred or upload a patched version of the docs for time fixed? And for removing the request in PEP 42, should I email Jeremy about it or Barry? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 14:47 Message: Logged In: YES user_id=6380 OK, deleting all old files as promised. All tests succeed. I think I'll check this version in (but it may be tomorrow, since I've got a few other things to take care of). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 14:39 Message: Logged In: YES user_id=357491 God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files. I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export. I removed the __future__ import. And thanks for the piece of advice; I was taking the advice that __future__ statements should come before code a little too far. =) As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files. That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 08:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 14:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 15:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 14:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 14:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 09:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 13:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-26 21:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 18:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-20 21:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 14:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 12:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 13:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 17:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 23:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 16:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 15:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 22:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-03 22:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-03 21:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 00:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 06:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 14:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 15:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 14:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Fri Jul 19 01:43:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 17:43:49 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 19:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-18 20:43 Message: Logged In: YES user_id=33168 Brett, I'm still following. It wasn't that bad. :-) Guido, let me know if you want me to do anything/check stuff in. Docs are fine to upload here. I can change PEP 42 also. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 19:35 Message: Logged In: YES user_id=357491 Since I had the time, I went ahead and did a patch for libtime.tex that removes the comment saying that strptime fully relies on the C library and uploaded it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 18:34 Message: Logged In: YES user_id=357491 Wonderful! About the docs; do you want me to email Fred or upload a patched version of the docs for time fixed? And for removing the request in PEP 42, should I email Jeremy about it or Barry? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 17:47 Message: Logged In: YES user_id=6380 OK, deleting all old files as promised. All tests succeed. I think I'll check this version in (but it may be tomorrow, since I've got a few other things to take care of). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 17:39 Message: Logged In: YES user_id=357491 God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files. I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export. I removed the __future__ import. And thanks for the piece of advice; I was taking the advice that __future__ statements should come before code a little too far. =) As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files. That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 11:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 17:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 18:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 17:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 17:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 12:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 16:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-27 00:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 21:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-21 00:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 17:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 15:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 16:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 20:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-05 02:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 19:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 18:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 01:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 01:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 00:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 03:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 09:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 17:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 18:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 18:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 17:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Fri Jul 19 02:03:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 18:03:56 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 01:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-18 21:03 Message: Logged In: YES user_id=33168 Add patch for configure. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-15 18:33 Message: Logged In: YES user_id=33168 Sorry, I forgot about this patch. I just tested on Linux (RedHat 7.2). No problems, all expected tests successful. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 20:41 Message: Logged In: YES user_id=14198 My patch is after Martin's so hopefully I have the macros correct (or at least haven't regressed anything of his!) DL_*PORT still exists, but is deprecated. Eventually every header will change, but for now DL_*PORT still works as before. And yes, finding autoconf-2.5.3 for my cygwin and linux platforms is what took 1/2 the time of getting this patch together :) Another report of success on Linux would be great! To date, I have not heard of a single person trying this patch on any platform. Thanks. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-05 14:45 Message: Logged In: YES user_id=33168 I think Martin checked in the change to drop support for win16, so some of the macros may have changed (MS_WINDOWS, MS_WIN32). Won't all the files which use DL_*PORT (most headers in Include) will have to change? Michael's explanation of autoconf is what I do. Make sure you have version 2.53 though. Let me know if you want me to test on linux. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 02:45 Message: Logged In: YES user_id=14198 ok - thanks! Attaching a new patch that works correctly with autheader. I'm gunna need help checking this in tho, but one step at a time <0.1 wink> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-04 08:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-03 21:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 06:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-01 21:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 14:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-23 23:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 01:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Fri Jul 19 07:57:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 18 Jul 2002 23:57:40 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 15:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2002-07-19 16:57 Message: Logged In: YES user_id=14198 Thanks all! Checking in configure; /cvsroot/python/python/dist/src/configure,v <-- configure new revision: 1.322; previous revision: 1.321 Checking in pyconfig.h.in; /cvsroot/python/python/dist/src/pyconfig.h.in,v <-- pyconfig.h.in new revision: 1.43; previous revision: 1.42 Checking in configure.in; /cvsroot/python/python/dist/src/configure.in,v <-- configure.in new revision: 1.333; previous revision: 1.332 Checking in Makefile.pre.in; /cvsroot/python/python/dist/src/Makefile.pre.in,v <-- Makefile.pre.in new revision: 1.88; previous revision: 1.87 Checking in Include/pyport.h; /cvsroot/python/python/dist/src/Include/pyport.h,v <-- pyport.h new revision: 2.52; previous revision: 2.51 Checking in Include/import.h; /cvsroot/python/python/dist/src/Include/import.h,v <-- import.h new revision: 2.28; previous revision: 2.27 Checking in PC/pyconfig.h; /cvsroot/python/python/dist/src/PC/pyconfig.h,v <-- pyconfig.h new revision: 1.14; previous revision: 1.13 Checking in PC/_winreg.c; /cvsroot/python/python/dist/src/PC/_winreg.c,v <-- _winreg.c new revision: 1.11; previous revision: 1.10 Checking in Modules/_sre.c; /cvsroot/python/python/dist/src/Modules/_sre.c,v <-- _sre.c new revision: 2.82; previous revision: 2.81 Checking in Modules/pyexpat.c; /cvsroot/python/python/dist/src/Modules/pyexpat.c,v <-- pyexpat.c new revision: 2.70; previous revision: 2.69 Checking in Python/thread.c; /cvsroot/python/python/dist/src/Python/thread.c,v <-- thread.c new revision: 2.45; previous revision: 2.44 Checking in Doc/ext/extending.tex; /cvsroot/python/python/dist/src/Doc/ext/extending.tex,v <-- extending.tex new revision: 1.22; previous revision: 1.21 ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-19 11:03 Message: Logged In: YES user_id=33168 Add patch for configure. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-16 08:33 Message: Logged In: YES user_id=33168 Sorry, I forgot about this patch. I just tested on Linux (RedHat 7.2). No problems, all expected tests successful. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-06 10:41 Message: Logged In: YES user_id=14198 My patch is after Martin's so hopefully I have the macros correct (or at least haven't regressed anything of his!) DL_*PORT still exists, but is deprecated. Eventually every header will change, but for now DL_*PORT still works as before. And yes, finding autoconf-2.5.3 for my cygwin and linux platforms is what took 1/2 the time of getting this patch together :) Another report of success on Linux would be great! To date, I have not heard of a single person trying this patch on any platform. Thanks. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-06 04:45 Message: Logged In: YES user_id=33168 I think Martin checked in the change to drop support for win16, so some of the macros may have changed (MS_WINDOWS, MS_WIN32). Won't all the files which use DL_*PORT (most headers in Include) will have to change? Michael's explanation of autoconf is what I do. Make sure you have version 2.53 though. Let me know if you want me to test on linux. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 16:45 Message: Logged In: YES user_id=14198 ok - thanks! Attaching a new patch that works correctly with autheader. I'm gunna need help checking this in tho, but one step at a time <0.1 wink> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-04 22:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-04 11:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 20:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-02 11:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-02 04:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-24 13:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 15:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Fri Jul 19 08:23:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 19 Jul 2002 00:23:24 -0700 Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT Message-ID: Patches item #566100, was opened at 2002-06-08 01:14 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 Category: Core (C code) Group: None Status: Closed Resolution: Fixed Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Rationalize DL_IMPORT and DL_EXPORT Initial Comment: Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky and broken. We have come up with purpose oriented macros to replace them. PyAPI_FUNC: public Python functions PyAPI_DATA: public Python data PyMODINIT_FUNC: extension module init functions. These cover all existing cases of DL_IMPORT and DL_EXPORT in the core. This patch simply introduces the new macros (keeping the old ones), and changes a small amount of code to actually use these macros. The vast majority of the existing Python code using DL_IMPORT/DL_EXPORT has not been touched. I have a patch that changes the following: * PC/pyconfig.h - creates the new PyAPI/MODINIT macros, but also rationalizes this header file considerably. All common macros between the various compilers have been moved to a common section. This simplifies the header significantly. * Include/pyport.h - creates the new PyAPI/MODINIT macros for non windows platforms. * Include/import.h - move to the new macros. I picked this header file at random, mainly to prove that the new macros do indeed work. * PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c - move to the PyMODINIT_FUNC macro. Patch tested on Windows and Linux. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-19 03:23 Message: Logged In: YES user_id=31435 Au contraire, thank you! ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-19 02:57 Message: Logged In: YES user_id=14198 Thanks all! Checking in configure; /cvsroot/python/python/dist/src/configure,v <-- configure new revision: 1.322; previous revision: 1.321 Checking in pyconfig.h.in; /cvsroot/python/python/dist/src/pyconfig.h.in,v <-- pyconfig.h.in new revision: 1.43; previous revision: 1.42 Checking in configure.in; /cvsroot/python/python/dist/src/configure.in,v <-- configure.in new revision: 1.333; previous revision: 1.332 Checking in Makefile.pre.in; /cvsroot/python/python/dist/src/Makefile.pre.in,v <-- Makefile.pre.in new revision: 1.88; previous revision: 1.87 Checking in Include/pyport.h; /cvsroot/python/python/dist/src/Include/pyport.h,v <-- pyport.h new revision: 2.52; previous revision: 2.51 Checking in Include/import.h; /cvsroot/python/python/dist/src/Include/import.h,v <-- import.h new revision: 2.28; previous revision: 2.27 Checking in PC/pyconfig.h; /cvsroot/python/python/dist/src/PC/pyconfig.h,v <-- pyconfig.h new revision: 1.14; previous revision: 1.13 Checking in PC/_winreg.c; /cvsroot/python/python/dist/src/PC/_winreg.c,v <-- _winreg.c new revision: 1.11; previous revision: 1.10 Checking in Modules/_sre.c; /cvsroot/python/python/dist/src/Modules/_sre.c,v <-- _sre.c new revision: 2.82; previous revision: 2.81 Checking in Modules/pyexpat.c; /cvsroot/python/python/dist/src/Modules/pyexpat.c,v <-- pyexpat.c new revision: 2.70; previous revision: 2.69 Checking in Python/thread.c; /cvsroot/python/python/dist/src/Python/thread.c,v <-- thread.c new revision: 2.45; previous revision: 2.44 Checking in Doc/ext/extending.tex; /cvsroot/python/python/dist/src/Doc/ext/extending.tex,v <-- extending.tex new revision: 1.22; previous revision: 1.21 ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-18 21:03 Message: Logged In: YES user_id=33168 Add patch for configure. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-15 18:33 Message: Logged In: YES user_id=33168 Sorry, I forgot about this patch. I just tested on Linux (RedHat 7.2). No problems, all expected tests successful. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 20:41 Message: Logged In: YES user_id=14198 My patch is after Martin's so hopefully I have the macros correct (or at least haven't regressed anything of his!) DL_*PORT still exists, but is deprecated. Eventually every header will change, but for now DL_*PORT still works as before. And yes, finding autoconf-2.5.3 for my cygwin and linux platforms is what took 1/2 the time of getting this patch together :) Another report of success on Linux would be great! To date, I have not heard of a single person trying this patch on any platform. Thanks. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-05 14:45 Message: Logged In: YES user_id=33168 I think Martin checked in the change to drop support for win16, so some of the macros may have changed (MS_WINDOWS, MS_WIN32). Won't all the files which use DL_*PORT (most headers in Include) will have to change? Michael's explanation of autoconf is what I do. Make sure you have version 2.53 though. Let me know if you want me to test on linux. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-05 02:45 Message: Logged In: YES user_id=14198 ok - thanks! Attaching a new patch that works correctly with autheader. I'm gunna need help checking this in tho, but one step at a time <0.1 wink> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-04 08:28 Message: Logged In: YES user_id=6656 pyconfig.h.in is a bit like configure. when you edit configure.in, you're expected to run autoconf to make the configure script and check that in too. same with pyconfig.h.in, except that it is made by autoheader. try running autoheader and see what happens. (I hope someone -- Martin? -- will correct me if I have this wrong). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-03 21:35 Message: Logged In: YES user_id=14198 I'm a little confused by pyconfig.h.in. Can someone please explain the process to me? What I see is: * reverting my pyconfig.h.in change prevents the new symbol from appearing in pyconfig.h * A CVS log of pyconfig.h.in shows heavy editing, with at least 6 well-commented checkins in June alone. So, all the evidence points that pyconfig.h.in does need modification. Can someone please clarify? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-07-02 06:16 Message: Logged In: YES user_id=6656 Um, you are aware that pyconfig.h.in is auto-generated (by autoheader)? But if you've made edits to configure.in, you're probably ok. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-07-01 21:47 Message: Logged In: YES user_id=14198 OK - here is a new ambitious patch ;) It attempts to rationalize all platforms, not just the PC. * pyport.h now sets up most of the import/export magic. It looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new macros) that control the behaviour. * Py_ENABLE_SHARED has been added to pyconfig.h.in and configure.in, so that this macro is created in pyconfig.h whenever '--enable-shared' is passed to configure. Py_BUILD_CORE is passed via a "/D" option only when the core itself is built (ie, not extensions etc) * PC/pyconfig.h has been rationalized heavily. * A couple of places in the core have been changed to use the new macros - more to test that it actually works. This has been tested on Windows using MSVC, Windows using cygwin/gcc, and RH7 linux. I consider it basically "done" so please comment away. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-07-01 14:03 Message: Logged In: YES user_id=38376 +1 (possibly except for the MODINIT_FUNC name...) and yes, _sre.c is supposed to compile under earlier versions as well, but I can fix that later on. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-23 23:18 Message: Logged In: YES user_id=33168 I like the idea, but haven't looked at the patch. I hope to look soon and give better feedback. But I'll wait until after you upload the new version. :-) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2002-06-21 01:20 Message: Logged In: YES user_id=14198 Just incase anyone was going to have a look at this , I am working on a better version by integrating some of the cygwin autoconf work. Just want to avoid wasting other's time ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470 From noreply@sourceforge.net Fri Jul 19 18:09:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 19 Jul 2002 10:09:54 -0700 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 19:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Guido van Rossum (gvanrossum) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-19 13:09 Message: Logged In: YES user_id=6380 Thanks! All checked in. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-18 20:43 Message: Logged In: YES user_id=33168 Brett, I'm still following. It wasn't that bad. :-) Guido, let me know if you want me to do anything/check stuff in. Docs are fine to upload here. I can change PEP 42 also. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 19:35 Message: Logged In: YES user_id=357491 Since I had the time, I went ahead and did a patch for libtime.tex that removes the comment saying that strptime fully relies on the C library and uploaded it. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 18:34 Message: Logged In: YES user_id=357491 Wonderful! About the docs; do you want me to email Fred or upload a patched version of the docs for time fixed? And for removing the request in PEP 42, should I email Jeremy about it or Barry? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 17:47 Message: Logged In: YES user_id=6380 OK, deleting all old files as promised. All tests succeed. I think I'll check this version in (but it may be tomorrow, since I've got a few other things to take care of). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-18 17:39 Message: Logged In: YES user_id=357491 God I wish I could delete those old files! Poor Neal Norwitz was nice enough to go over my code once to help me make it sure it was up for being included in the stdlib, but he initially used an old version. Thankfully he was nice enough to look over the newer version at the time. But no, SF does not give me the priveleges to delete old files (and why is that? I am the creator of the patch; you would think I could manage my own files). I re-uploaded everything now. All files that specify they were uploaded 2002-07-17 are the newest files. I am terribly sorry about this whole name mix-up. I have now fixed test_strptime.py to use _strptime. I completely removed the strptime import so that the strptime testing will go through time and thus test which ever version time will export. I removed the __future__ import. And thanks for the piece of advice; I was taking the advice that __future__ statements should come before code a little too far. =) As for your error, that is because the test_strptime.py you are using is old. I originally had a test in there that checked to make sure the regex returned was the same as the one being tested for; that was a bad decision. So I went through and removed all hard-coded tests like that. Unfortunately the version you ran still had that test in there. SF should really let patch creators delete old files. That's it this time. Now I await the next drama in this never-ending saga of trying to make a non-trivial contribution to Python. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 11:29 Message: Logged In: YES user_id=6380 - Can you please delete all the obsolete uploads? (If SF won't let you, let me know and I'll do it for you, leaving only the most recend version of each.) - There' still a confusion between strptime.py and _strptime.py; your test_time.py imports strptime, and so does the latest version of test_strptime.py I can find. - The "from __future__ import division" is unnecessary, since you're never using the single / operator (// doesn't need the future statement). Also note that future statements should come *after* a module's docstring (for future reference :-). - When I run test_strptime.py, I get one failure: ====================================================================== FAIL: Test TimeRE.pattern. ---------------------------------------------------------------------- Traceback (most recent call last): File "../Lib/test/test_strptime.py", line 124, in test_pattern self.failUnless(pattern_string.find("(?P(3[0-1])|([0-2]\d)|\d|( \d))") != -1, "did not find 'd' directive pattern string '%s'" % pattern_string) File "/home/guido/python/dist/src/Lib/unittest.py", line 262, in failUnless if not expr: raise self.failureException, msg AssertionError: did not find 'd' directive pattern string '(?P(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P3[0-1]|[0-2]\d|\d| \d)' ---------------------------------------------------------------------- I haven't looked into this deeper. Back to you... ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-16 17:34 Message: Logged In: YES user_id=357491 Two things have been uploaded. First, test_time.py w/ a strptime test. It is almost an exact mirror of the strftime test; only difference is that I used strftime to test strptime. So if strftime ever fails, strptime will fail also. I feel this is fine since strptime depends on strftime so much that if strftime were to fail strptime would definitely fail. The other file is version 2.1.5 of strptime. I made two changes. One was to remove the TypeError raised when %I was used without %p. This was from me being very picky about only accepting good data strings. The second was to go through and replace all whitespace in the format string with \s*. That basically makes this version of strptime XPG compatible as far as I (and the NetBSD man page) can tell. The only difference now is that I do not require whitespace or a non-alphanumeric character between format strings. Seems like a ridiculous requirement since the requirement that whitespace be able to compress down to no whitespace negates this requirement. Oh well, we are more than compliant now. I decided not to write a patch for the docs to make them read more leniently for what the format directives. Figured I would just let people who think like me do it in a more "proper" way with leading zeros and those who don't read it like that to still be okay. I think that is everything. If you want more in-depth tests, Guido, I can add them to the testing suite, but I figured that since this is (hopefully) going in bug-free it needs only be checked to make sure it isn't broken by anything. And if you do want more in-depth tests, do you want me to add mirror tests for strftime or not worry about that since that is the ANSI C library's problem? Other then that, I think strptime is pretty much done. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 18:27 Message: Logged In: YES user_id=357491 Uploaded 2.1.4. I added \d to the end of all relevant regexes (basically all of them but %y and %Y) to deal with non-zero-leading numbers. I also made the regex case-insensitive. As for the diff failing, I am wondering if I am doing something wrong. I am just running diff -c CVS_file modified_file > diff_file . Isn't that right? I will work on merging my strptime tests into the time regression tests and upload a patch here. I will do a patch for the docs since it is not consistent with the explanation of struct_time (or at least in my opinion). I tried finding XPG docs, but the best Google came up with was the NetBSD man pages for strptime (which they claim is XPG compliant). The difference between that implementation and mine is that NetBSD's allows whitespace (defined as isspace()) in the format string to match \s* in the data string. It also requires a whitespace or a non-alphanumeric character while my implementation does not require that. Personally, I don't like either difference. If they were used, though, there might be a possibility of rewriting strptime to just use a bunch of string methods instead of regexes for a possible performance benefit. But I prefer regexes since it adds checks of the input. That and I just like regexes period. =) Also, I noticed that your little test returned 0 for all unknown values. Mine returns -1 since 0 can be a legitimate value for some and I figured that would eliminate ambiguity. I can change it to 0, though. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 17:13 Message: Logged In: YES user_id=6380 Hm, the new diff_time *still* fails to apply. But don't worry about that. I'd love to see regression tests for time.strptime. Please upload them here -- don't start a new patch. I think your interpretation of the docs is overly restrictive; the table shows what strftime does but I think it's reasonable for strptime to accept missing leading zeros. You can upload a patch for the docs too if you feel that's necessary. You may also try to read up on what the XPG standard says about strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-12 17:02 Message: Logged In: YES user_id=357491 To respond to your points, Guido: (a) I accidentally uploaded the old file. Sorry about that. I misnamed the new one 'time_diff" but in my head I meant to overwrite "diff_time". I have uploaded the new one. (b) See (a) (c) Oops. That is a complete oversight on my part. Now in (d) you mention writing up regression tests for the standard time.strptime. I am quite hapy to do this. Do you want that as a separate patch? If so I will just stop with uploading tests here and just start a patch with my strptime tests for the stdlib tests. (d) The reason this test failed is because your input is not compliant with the Python docs. Read what %m accepts: Month as a decimal number [01,12] Notice the leading 0 for the single digit month. My implementation follows the docs and not what glibc suggests. If you want, I can obviously add on to all the regexes \d as an option and eliminate this issue. But that means it will no longer be following the docs. This tripped Skip up too since no one writes numbers that way; strftime does, though. Now if the docs meant for no trailing 0, I think they should be rewritten since that is misleading. In other words, either strptime stays as it is and follows the docs or I change the regexes, but then the docs will have to be changed. I can go either way, but I personally would want to follow the docs as-is since strptime is meant to parse strftime output and not human output. =) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-12 12:58 Message: Logged In: YES user_id=6380 Hm. This isn't done yet. I get these problems: (a) the patch for timemodule.c doesn't apply cleanly in current CVS (trivial) (b) it still tries to import strptime (no leading '_') (also trivial) (c) so does test_strptime.py (also trivial) (d) the simplest of simple examples fails: With Linux's strptime: >>> time.strptime("7/12/02", "%m/%d/%y") (2002, 7, 12, 0, 0, 0, 4, 193, 0) >>> With yours: >>> time.strptime("7/12/02", "%m/%d/%y") Traceback (most recent call last): File "", line 1, in ? File "/home/guido/python/dist/src/Lib/_strptime.py", line 392, in strptime raise ValueError("time data did not match format") ValueError: time data did not match format >>> Perhaps you should write a regression test suite for the strptime function as found in the time module courtesy of libc, and then make sure that your code satisfies it? ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-07-10 16:51 Message: Logged In: YES user_id=357491 The actual 2.1.3 edition of strptime is now up. I don't think there are any changes, but since I renamed the file _strptime.py, I figured uploading it again wouldn't hurt. I also uploaded a new contextual diff of the time module taken from CVS on 2002-07-10. The only difference between this and the previous diff (which was against 2.2.1's time module) is the change of the imported module to _strptime. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-27 00:54 Message: Logged In: YES user_id=357491 Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down below!). Just a little bit more cleanup. Biggest change is that I changed the default format string and made strptime() raise ValueError instead of TypeError. This was all done to match the time module docs. I also fiddled with the regexes so that the groups were none-capturing. Mainly done for a possible performance improvement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-23 21:06 Message: Logged In: YES user_id=357491 2.1.1 is now uploaded. Almost a purely syntatical change. >From discussions on python-dev I renamed the helper fxns so they are all lowercase-style. Also changed them so that they state what the fxn returns. I also put all of the imports on their own line as per PEP 8. The only semantical change I did was directly import re.compile since it is the only thing I am using from the re module. These changes required tweaking of my exhaustive testing suite, so that got uploaded, too. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-21 00:35 Message: Logged In: YES user_id=357491 I have uploaded a contextual diff of timemodule.c with a callout to strptime.strptime when HAVE_STRPTIME is not defined just as Guido requested. It's my first extension module, so I am not totally sure of myself with it. But since Alex Marttelli told me what I needed to do I am fairly certain it is correct. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-19 17:49 Message: Logged In: YES user_id=357491 2.1.0 is now up and ready for use. I only changed two things to the code, but since they change the semantics of stprtime()s use, I made this a new minor release. One, I removed the ability to pass in your own LocaleTime object. I did this for two reasons. One is because I forgot about how default arguments are created at the time of function creation and not at each fxn call. This meant that if someone was not thinking and ran strptime() under one locale and then switched to another locale without explicitly passing in a new LocaleTime object for every call for the new locale, they would get bad matches. That is not good. The other reason was that I don't want to force users to pass in a LocaleTime object on every call if I can't have a default value for it. This is meant to act as a drop-in replacement for time.strptime(). That forced the removal of the parameter since it can't have a default value. In retrospect, though, people will probably never parse log files in other languages other then there default locale. And if they were, they should change the locale for the interpreter and not just for strptime(). The second change was what triggers strptime() to return an re object that it can use. Initially it was any nothing value (i.e., would be considered false), but I realized that an empty string could trigger that and it would be better to raise a TypeError then let some error come up from trying to use the re object in an incorrect way. Now, to have an re object returned, you pass in False. I figured that there is a very minimal chance of passing in False when you meant to pass in a string. Also, False as the data_string, to me, means that I don't want what would normally be returned. I debated about removing this feature from strptime(), but I profiled it and most of the time comes from TimeRE's __getitem__. So building the string to be compiled into a regex is the big bottleneck. Using a precompiled regex instead of constructing a new one everytime took 25% of the time overall for strptime() when calling strptime() 10,000 times in a row. This is a conservative number, IMO, for calls in a row; I checked the Apache hit logs for a single day on Open Computing Facility's web server (http://www.ocf.berkeley.edu/) and there were 188,562 hits on June 16 alone. So I am going to keep the feature until someone tells me otherwise. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-18 15:05 Message: Logged In: YES user_id=357491 I have uploaded v. 2.0.4. It now uses the calendar module to figure out the names of weekdays and months. Thanks goes out to Guido for pointing out this undocumented feature of calendar. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-17 16:11 Message: Logged In: YES user_id=357491 I uploaded v.2.0.3. Beyond implementing what I mentioned previously (raising TypeError when a match fails, adding \d to all applicable regexes) I did a few more things. For one, I added a special " \d" to the numeric month regex. I discovered that ANSI C for ctime displays the month with a leading space if it is a single digit. So to deal with that since at least Skip's C library likes to use that format for %c, I went ahead and added it. I changed all attributes in LocaleTime to lists. A recent mail on python-dev from GvR said that lists are for homogeneous data, which everything that is grouped together in LocaleTime is. It also simplified the code slightly and led to less conversions of data types. I also added a method that raises a TypeError if you try to assign to any of LocaleTime's attributes. I thought that if you left out the set value for property() it wouldn't work; didn't realize it just defaults over to __setitem__. So I added that method as the set value for all of the property()s. It does require 2.2.1 now since I used True and False without defining them. Obviously just set those values to 1 and 0 respectively if you are running under 2.2 I also updated the overly exhaustive PyUnit suite that I have for testing my code. It is not black-box testing, though; Skip's pruned version of my testing suite fits that bill (I think). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-12 20:46 Message: Logged In: YES user_id=357491 I am back from my vacation and ready to email python- dev about getting this patch accepted (whether to modify time or make this a separate module, etc.). I think I will do the email on June 17. Before then, though, I am going to make two changes. One is the raise a Value Error exception if the regex doesn't match (to try to match time.strptime()s exception as seen in Skip's run of the unit test). The other change is to tack on a \d on all numeric formats where it might come out as a single digit (i.e., lacking a leading zero). This will be for v2.0.3 which I will post before June 17. If there is any reason anyone thinks I should hold back on this, please let me know! I would like to have this code as done as possible before I make any announcement. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-05 02:32 Message: Logged In: YES user_id=357491 I went ahead an implemented most of Neal's suggestions. On a few, of them, though, I either didn't do it or took a slightly different route. For the 'yY' vs. ('y', 'Y'), I went with 'yY'. If it gives a performance boost, why not since it doesn't make the code harder to read. Implementing it actually had me catch some redundant code for dealing with a literal %. The tests in the __init__ for LocaleTime have been reworked to check that they are either None or have the proper length, otherwise they raise a TypeError. I have gone through and tried to catch all the lines that were over 80 characters and cut them up to fit. For the adding of '' to tuples, I created a method that could specify front or back concatination. Not much different from before, but it allows me to specify front or back concatination easily. I explained why the various magic dates were used. I in no way have to worry about leap year. Since it is not validating the data string for validity the fxn just takes the data and uses it. I have no reason to calc for leap year. date_time[offset] has been replaced with current_format and added the requisite two lines to assign between it and the list. You are only supposed to use __new__ when it is immutable. Since dict is obviously mutable, I don't need to worry about it. Used Neal's suggested shortening of the sorter helper fxn. I also used the suggestion of doing x = y = z = -1. Now it barely fits on a single line instead of two. All numerical compares use == and != instead of is and is not. Didn't know about that dependency on NSMALL((POS)|(NEG))INTS; good thing to know. The doc string was backwards. Thanks for catching that, Neal. I also went through and added True and False where appropriate. There is a line in the code where True = 1; False = 0 right at the top. That can obviously be removed if being run under Python 2.3. And I completely understand being picky about minute details where maintainability is a concern. I just graduated from Cal and so the memory of seeing beginning programmers' code is still fresh in my mind . And I will query python-dev about how to go about to get this added after the bugs are fixed and I am back home (going to be out of town until June 16). I will still be periodically checking email, though, so I will continue to implement any suggestions/bugfixes that anyone suggests/finds. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-04 19:33 Message: Logged In: YES user_id=33168 Hopefully, I'm looking at the correct patch this time. :-) To answer one question you had (re: 'yY' vs. ('y', 'Y')), I'm not sure people really care. It's not big to me. Although 'yY' is faster than ('y', 'Y'). In order to try to reduce the lines where you raise an error (in __init__) you could change 'sequence of ... must be X items long' to '... must have/contain X items'. Generally, it would be nice to make sure none of the lines are over 72-79 chars (see PEP 8). Instead of doing: newlist = list(orig) newlist.append('') x = tuple(newlist) you could do: x = tuple(orig[:]) or something like that. Perhaps a helper function? In __init__ do you want to check the params against 'is None' If someone passes a non-sequence that doesn't evaluate to False, the __init__ won't raise a TypeError which it probably should. What is the magic date used in __calc_weekday()? (1999/3/15+ 22:44:55) is this significant, should there be a comment? (magic dates are used elsewhere too, e.g., __calc_month, __calc_am_pm, many more) __calc_month() doesn't seem to take leap year into account? (not sure if this is a problem or not) In __calc_date_time(), you use date_time[offset] repetatively, couldn't you start the loop with something like dto = date_time[offset] and then use dto (dto is not a good name, I'm just making an example) Are you supposed to use __init__ when deriving from built-ins (TimeRE(dict)) or __new__? (sorry, I don't remember the answer) In __tupleToRE.sorter(), instead of the last 3 lines, you can do: return cmp(b_length, a_length) Note: you can do x = y = z = -1, instead of x = -1 ; y = -1 ; z = -1 It could be problematic to compare x is -1. You should probably just use ==. It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS were not defined in Objects/intobject.c. This docstring seems backwards: def gregToJulian(year, month, day): """Calculate the Gregorian date from the Julian date.""" I know a lot of these things seem like a pain. And it's not that bad now, but the problem is maintaining the code. It will be easier for everyone else if the code is similar to the rest. BTW, protocol on python-dev is pretty loose and friendly. :-) ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 18:33 Message: Logged In: YES user_id=357491 Thanks for being so prompt with your response, Skip. I found the problem with your %c. If you look at your output you will notice that the day of the month is '4', but if you look at the docs for time.strftime() you will notice that is specifies the day of the month (%d) as being in the range [01,31]. The regex for %d (simplified) is '(3[0-1])|([0-2]\d)'; not being represented by 2 digits caused the regex to fail. Now the question becomes do we follow the spec and chaulk this up to a non-standard strftime() implementation, or do we adapt strptime to deal with possible improper output from strftime()? Changing the regexes should not be a big issue since I could just tack on '\d' as the last option for all numerical regexes. As for the test error from time.strptime(), I don't know what is causing it. If you look at the test you will notice that all it basically does is parsetime(time.strftime("%Z"), "%Z"). Now how that can fail I don't know. The docs do say that strptime() tends to be buggy, so perhaps this is a case of this. One last thing. Should I wait until the bugs are worked out before I post to python-dev asking to either add this as a module to the standard library or change time to a Python stub and rename timemodule.c? Should I ask now to get the ball rolling? Since I just joined python-dev literally this morning I don't know what the protocol is. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 01:55 Message: Logged In: YES user_id=44345 Here ya go... % ./python Python 2.3a0 (#185, Jun 1 2002, 23:19:40) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> now = time.localtime(time.time()) >>> now (2002, 6, 4, 0, 53, 39, 1, 155, 1) >>> time.strftime("%c", now) 'Tue Jun 4 00:53:39 2002' >>> time.tzname ('CST', 'CDT') >>> time.strftime("%Z", now) 'CDT' ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-04 01:35 Message: Logged In: YES user_id=357491 I have uploaded a verision 2.0.1 which fixes the %b format bug (stupid typo on a variable name). As for the %c directive, I pass that test. Can you please send the output of strftime and the time tuple used to generate it? As for the time.strptime() failure, I don't have time.strptime() on any system available to me, so could you please send me the output you have for strftime('%Z'), and time.tzname? I don't know how much %Z should be worried about since its use is deprecated (according to the time module's documentation). Perhaps strptime() should take the initiative and not support it? -Brett ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-06-04 00:52 Message: Logged In: YES user_id=44345 Brett, Please see the drastically shortened test_strptime.py. (Basically all I'm interested in here is whether or not strptime.strptime and time.strptime will pass the tests.) Near the top are two lines, one commented out: parsetime = time.strptime #parsetime = strptime.strptime Regardless which version of parsetime I get, I get some errors. If parsetime == time.strptime I get ====================================================================== ERROR: Test timezone directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 69, in test_timezone strp_output = parsetime(strf_output, "%Z") ValueError: unconverted data remains: 'CDT' If parsetime == strptime.strptime I get ERROR: *** Test %c directive. *** ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 75, in test_date_time self.helper('c', position) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 380, in strptime found_dict = found.groupdict() AttributeError: NoneType object has no attribute 'groupdict' ====================================================================== ERROR: Test for month directives. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_strptime.py", line 31, in test_month self.helper(directive, 1) File "test_strptime.py", line 17, in helper strp_output = parsetime(strf_output, '%'+directive) File "strptime.py", line 393, in strptime month = list(locale_time.f_month).index(found_dict['b']) ValueError: list.index(x): x not in list This is with a very recent interpreter (updated from CVS in the past day) running on Mandrake Linux 8.1. Can you reproduce either or both problems? Got fixes for the strptime.strptime problems? Thx, Skip ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-06-02 03:44 Message: Logged In: YES user_id=357491 I'm afraid you looked at the wrong patch! My fault since I accidentally forgot to add a description for my patch. So the file with no description is the newest one and completely supercedes the older file. I am very sorry about that. Trust me, the new version is much better. I realized the other day that since the time module is a C extension file, would getting this accepted require getting BDFL approval to add this as a separate module into the standard library? Would the time module have to have a Python interface module where this is put and all other methods in the module just pass directly to the extension file? As for the suggestions, here are my replies to the ones that still apply to the new file: * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' -> True, but I personally find it easier to read using the tuple. If it is standard practice in the standard library to do it the suggested way, I will change it. * daylight should use the new bools True, False (this also applies to any other flags) -> Oops. Since I wrote this under Python 2.2.1 I didn't think about it. I will go through the code and look for places where True and False should be used. -Brett C. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-06-01 09:46 Message: Logged In: YES user_id=33168 Overall, the patch looks pretty good. I didn't check for completeness or consistency, though. * You don't need: from exceptions import Exception * The comment "from strptime import * will only export strptime()" is not correct. * I'm not sure what should be included for the license. * Why do you need success flag in CheckIntegrity, you raise an exception? (You don't need to return anything, raise an exception, else it's ok) * In return_time(), could you change xrange(9) to range(len(temp_time)) this removes a dependancy. * strings are sequences, so instead of if found in ('y', 'Y') you can do if found in 'yY' * daylight should use the new bools True, False (this also applies to any other flags) * The formatting doesn't follow the standard (see PEP 8) (specifically, spaces after commas, =, binary ops, comparisons, etc) * Long lines should be broken up The test looks pretty good too. I didn't check it for completeness. The URL is wrong (too high up), the test can be found here: http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py I noticed a spelling mistake in the test: anme -> name. Also, note that PEP 42 has a comment about a python strptime. So if this gets implemented, we need to update PEP 42. Thanks. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-05-27 17:38 Message: Logged In: YES user_id=357491 Version 2 of strptime() has now been uploaded. This nearly complete rewrite includes the removal of the need to input locale-specific time info. All need locale info is gleaned from time.strftime(). This makes it able to behave exactly like time.strptime(). ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 18:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 18:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 17:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Sat Jul 20 09:14:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 20 Jul 2002 01:14:37 -0700 Subject: [Patches] [ python-Patches-581396 ] Canvas "select_item" always returns None Message-ID: Patches item #581396, was opened at 2002-07-14 19:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470 Category: Tkinter >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) >Summary: Canvas "select_item" always returns None Initial Comment: bug in 2.1.3, 2.2.1 and CVS HEAD. One liner patch: *** /usr/lib/python2.1/lib-tk/Tkinter.py.orig Wed Jul 3 17:04:28 2002 --- /usr/lib/python2.1/lib-tk/Tkinter.py Wed Jul 3 17:04:31 2002 *************** *** 2096,2100 **** def select_item(self): """Return the item which has the selection.""" ! self.tk.call(self._w, 'select', 'item') def select_to(self, tagOrId, index): """Set the variable end of a selection in item TAGORID to INDEX.""" --- 2096,2100 ---- def select_item(self): """Return the item which has the selection.""" ! return self.tk.call(self._w, 'select', 'item') def select_to(self, tagOrId, index): """Set the variable end of a selection in item TAGORID to INDEX.""" ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470 From noreply@sourceforge.net Sat Jul 20 17:49:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 20 Jul 2002 09:49:31 -0700 Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix) Message-ID: Patches item #584245, was opened at 2002-07-20 12:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: get python to link on OSF1 (Dec Unix) Initial Comment: Attached is a patch to fix the linking of python (makedev not found) on Dec OSF/1 Unix 5.1. This patch has also been tested on Linux (RedHat 7.2). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470 From noreply@sourceforge.net Sat Jul 20 19:35:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 20 Jul 2002 11:35:38 -0700 Subject: [Patches] [ python-Patches-568348 ] Add param to email.Utils.decode() Message-ID: Patches item #568348, was opened at 2002-06-12 23:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: atsuo ishimoto (ishimoto) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: Add param to email.Utils.decode() Initial Comment: While email.Utils.decode() is a quite useful function, I got a real world problem. Here in Japan, I receive a lot of RFC-hostile messages everyday. Since they contains illegal characters cannot be converted to Unicode by JapaneseCodecs, email.Utils.decode() chokes with UnicodeError. My solution is an adding optional 'errors' parameter which is passed to unicode() function. This allows me to replace illegal characters, instead of abandoning entire text. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-06-21 06:45 Message: Logged In: YES user_id=163326 I'd recommend to assign this patch to Barry Warsaw (bwarsaw), who is the maintainer of the email module. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470 From noreply@sourceforge.net Sun Jul 21 14:16:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 21 Jul 2002 06:16:17 -0700 Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp Message-ID: Patches item #578297, was opened at 2002-07-07 16:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) Assigned to: Andrew I MacIntyre (aimacintyre) Summary: fix for problems with test_longexp Initial Comment: The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour. More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue. The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens. The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc. The patch has been tested on OS/2 and FreeBSD: - on OS/2, the patch allows even a system with modest resources to complete test_longexp successfully and without swapping to death; on better resourced machines, the whole regression test is negligibly slower (0-1%) to complete. [gcc-2.8.1 -O2] - on FreeBSD (4.4 tested), test_longexp gains nearly 10%, and completes the whole regression test with a gain of about 2% (test_longexp is good for about 25% of the improvement). [gcc-2.95.3 -O3] Both platforms are neutral, performance wise, running MAL's PyBench 1.0. The patch in its current form is for experimental evaluation, and not intended for integration into the core. If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change. I've assigned this to Jack for review in the context of its performance on the Mac. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-21 23:16 Message: Logged In: YES user_id=250749 Ok, I've prepared patches to convert the following files to use PyMalloc for memory allocation: Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff) Python/compile.c (pymalloc-compile.diff) I didn't bother with the other files in Parser/ as my malloc logging shows that they only ever appear to make requests > 256 bytes. I have attached/will attach a summary from my malloc logging experiments for information. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-16 04:14 Message: Logged In: YES user_id=31435 Thanks for the detailed followup, Andrew! I incorporated some of this info into XXXROUNDUP's comments. Without either patch, the system malloc has to do two miserable things: (1) find bigger and bigger memory areas very frequently; and, (2) interleaved with that, allocate gazillions of tiny blocks too. #2 makes it difficult for the platform malloc to find free space contiguous to the blocks allocated for #1, unless it arranges to move them to "the end" of memory, or into their own memory segments. As a result it's likely to do a copy on nearly every large-block realloc, and the code used to do a realloc on every 3rd new child. The XXXROUNDUP patch addressed #1 by asking to grow blocks much less frequently; PyMalloc addresses #2 by getting the tiny blocks out of the platform malloc's hair. If the platform malloc is saved from either one, it's job becomes much easier. It would still be nice to switch the parser to using pymalloc. There are still disasters lurking, because some platform malloc packages appear to take quadratic time when *free*ing gazillions of tiny blocks (they thrash trying to coalesce them into larger contiguous free blocks). pymalloc doesn't try to coalesce free blocks, so is reliably immune to this disease. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-15 21:47 Message: Logged In: YES user_id=250749 To my surprise, Tim's checkin also works for the EMX port. I can only conclude that EMX's realloc() has a corner case tickled by test_longexp, that isn't hit with either the aggressive overallocation change or the PyMalloc change applied. It is also interesting to note the performance impact of Tim's checkin, particularly on FreeBSD. Typical runtimes for "python -E -tt Lib/test/regrtest.py -l test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc 2.95.3 -O3): total user sys baseline: 39.1s 32.7s 6.3s my patch: 37.1s 30.3 6.7s Tim's checkin: 8.4s 7.8s 0.6s my patch+Tim's checkin 5.5s 4.9s 0.5s These runs with Library modules already compiled. While Tim's comments about timing the regression test are noted, there are nonetheless consistent reductions in execution time of the regression test as well. Typical results on the same test box: total user sys baseline: 1386s 1097s 89s my patch: 1350s 1065s 93s Tim's checkin: 1265s 1003s 67s my patch+Tim's checkin 1230s 971s 65s With the EMX port, the difference in timing between Tim's checkin and my patch is small, both for test_longexp and the regression test. There are noticeable gains for both test_longexp and the whole regression test with both changes in place, although not as significant as the FreeBSD results. MAL's PyBench 1.0 exhibits negligible performance differences between the code states on both platforms, which is as I'd expect as it doesn't appear to test compile() or eval(). >From the above, I conclude that Tim's patch gets the most bang for the buck, and that my patch (or its intent) be rejected unless someone thinks pursuing the PyMalloc changes to the parser worthwhile. As an aside, I did a little research on the "XXX are those actually common?" question Tim posed in the comment associated with his change: In running Lib/compileall.py against the Lib directory, 89% of PyMem_RESIZE() calls in AddChild() are the n=1 case, and 9% are rounded up to n=4. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-07-08 20:09 Message: Logged In: YES user_id=45365 With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch. Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-08 16:38 Message: Logged In: YES user_id=31435 Jack, please do a cvs update and try this again. I checked in changes to PyNode_AddChild() that I expect will cure your particular woes here. Andrew, PyMalloc was designed for oodles of small allocations. Feel encouraged to write a patch to change the compiler to use PyObject_{Malloc, Realloc, Free} instead. Then it will automatically exploit PyMalloc when the latter is enabled. Note that the regression test suite incorporates random numbers in several tests, and in ways that can affect runtime. Small differences in aggregate test suite runtime are meaningless because of this. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-07-08 07:24 Message: Logged In: YES user_id=45365 Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem. The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-07-07 16:41 Message: Logged In: YES user_id=250749 Oops. On FreeBSD, test_longexp contributes 15% of the performance gain (not 25%) observed for the regression test with the patch applied. Also, I would expect to make this a platform specific change if its integrated, rather than a general change (unless that it is seen as more appropriate). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470 From noreply@sourceforge.net Sun Jul 21 21:29:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 21 Jul 2002 13:29:43 -0700 Subject: [Patches] [ python-Patches-584626 ] yield allowed in try/finally Message-ID: Patches item #584626, was opened at 2002-07-21 20:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584626&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Oren Tirosh (orenti) Assigned to: Nobody/Anonymous (nobody) Summary: yield allowed in try/finally Initial Comment: A generator's dealloc function now resumes a generator one last time by jumping directly to the return statement at the end of the code. As a result, the finally section of any try/finally blocks is executed. Any exceptions raised are treated just like exceptions in a __del__ finalizer. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584626&group_id=5470 From noreply@sourceforge.net Mon Jul 22 20:53:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 22 Jul 2002 12:53:16 -0700 Subject: [Patches] [ python-Patches-585101 ] Fix relative imports in regression tests Message-ID: Patches item #585101, was opened at 2002-07-22 15:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Barry A. Warsaw (bwarsaw) Assigned to: Jack Jansen (jackjansen) Summary: Fix relative imports in regression tests Initial Comment: The regression test suite uses intrapackage relative imports to import stuff like test_support, etc. There's no deep reason for this to be so, since "test" is a standard package. As long as all tests do something like "from test import test_support" or "import test.test_support" everything works fine. Keeping the relative imports makes life more difficult for tests that don't live in the expected location of Lib/test. This patch fixes this by making sure all test imports are absolute. This works fine on *nix, but rumor has it that the Mac tests are run differently so I'd like Jack to comment on whether this patch breaks his test suite or not. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470 From noreply@sourceforge.net Mon Jul 22 20:55:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 22 Jul 2002 12:55:16 -0700 Subject: [Patches] [ python-Patches-568348 ] Add param to email.Utils.decode() Message-ID: Patches item #568348, was opened at 2002-06-12 23:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: atsuo ishimoto (ishimoto) Assigned to: Barry A. Warsaw (bwarsaw) Summary: Add param to email.Utils.decode() Initial Comment: While email.Utils.decode() is a quite useful function, I got a real world problem. Here in Japan, I receive a lot of RFC-hostile messages everyday. Since they contains illegal characters cannot be converted to Unicode by JapaneseCodecs, email.Utils.decode() chokes with UnicodeError. My solution is an adding optional 'errors' parameter which is passed to unicode() function. This allows me to replace illegal characters, instead of abandoning entire text. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-07-22 15:55 Message: Logged In: YES user_id=12800 email.Utils.decode() is deprecated in favor of email.Header.decode_header(). Is this patch still worth it? I think email.Utils.decode() ought to go away. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-06-21 06:45 Message: Logged In: YES user_id=163326 I'd recommend to assign this patch to Barry Warsaw (bwarsaw), who is the maintainer of the email module. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470 From noreply@sourceforge.net Tue Jul 23 03:56:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 22 Jul 2002 19:56:14 -0700 Subject: [Patches] [ python-Patches-581396 ] Canvas "select_item" always returns None Message-ID: Patches item #581396, was opened at 2002-07-14 15:23 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470 Category: Tkinter Group: Python 2.3 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Matthias Klose (doko) >Assigned to: Neal Norwitz (nnorwitz) >Summary: Canvas "select_item" always returns None Initial Comment: bug in 2.1.3, 2.2.1 and CVS HEAD. One liner patch: *** /usr/lib/python2.1/lib-tk/Tkinter.py.orig Wed Jul 3 17:04:28 2002 --- /usr/lib/python2.1/lib-tk/Tkinter.py Wed Jul 3 17:04:31 2002 *************** *** 2096,2100 **** def select_item(self): """Return the item which has the selection.""" ! self.tk.call(self._w, 'select', 'item') def select_to(self, tagOrId, index): """Set the variable end of a selection in item TAGORID to INDEX.""" --- 2096,2100 ---- def select_item(self): """Return the item which has the selection.""" ! return self.tk.call(self._w, 'select', 'item') def select_to(self, tagOrId, index): """Set the variable end of a selection in item TAGORID to INDEX.""" ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-22 22:56 Message: Logged In: YES user_id=33168 Made sure to return None if no item was selected. Checked in as Tkinter.py 1.160.10.1 & 1.163 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470 From noreply@sourceforge.net Tue Jul 23 04:22:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 22 Jul 2002 20:22:34 -0700 Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0 Message-ID: Patches item #535335, was opened at 2002-03-26 13:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Jeffrey Honig (jchonig) Assigned to: Nobody/Anonymous (nobody) Summary: 2.2 patches for BSD/OS 5.0 Initial Comment: The following patches were necessary to get Python 2.2 to work on BSD/OS 5.0. More may follow as we are still attempting to resolve some issues related to the regression tests (although these may be OS issues). Thanks. Jeff ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-22 23:22 Message: Logged In: YES user_id=33168 Jeff, any chances of getting updates for this patch? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-06 04:49 Message: Logged In: YES user_id=21627 Is an update of this patch forthcoming? ---------------------------------------------------------------------- Comment By: Jeffrey Honig (jchonig) Date: 2002-03-26 14:08 Message: Logged In: YES user_id=96862 Re: configure.in vs configure: we don't use autoconf here so modifying configure.in doesn't help us. I should have copies the changes and submitted them, but then they aren't too hard to figure out.... Re: contrib{lib/include}: We install many of the packages that we install from the net (which we call contrib packages) into the /usr/contrib heirarchy. They won't be found by setup.py unless those paths are present. Re: regrtest.py: Apologies about the regrtest.py content, there are some tests in there that shouldn't be, ignore it for now, I'll submit an update later. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-26 13:53 Message: Logged In: YES user_id=33168 Lib/posixfile.py & Lib/test/test_fcntl.py seem harmless. configure is generated, so configure.in will need the changes made to it. There seem to be many tests which fail, but perhaps shouldn't: fork1, locale, minidom, poll, pyexpat, sax, unicode_file? I'm also unsure of the benefit of adding contrib/{lib/include} to setup.py. This could be fine, but I don't know anything about distutils. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 From noreply@sourceforge.net Tue Jul 23 04:34:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 22 Jul 2002 20:34:06 -0700 Subject: [Patches] [ python-Patches-506436 ] GETCONST/GETNAME/GETNAMEV speedup Message-ID: Patches item #506436, was opened at 2002-01-21 08:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Tim Peters (tim_one) Summary: GETCONST/GETNAME/GETNAMEV speedup Initial Comment: The attached patch redefines the GETCONST, GETNAME & GETNAMEV macros to do the following: * access the code object's consts and names through local variables instead of the long chain from f * use access macros to index the tuples and get the C string names The code appears correct, and I've had no trouble with it. It only provides the most trivial of improvement on pystone (around 1% when I see anything), but it's all those little things that add up, right? Skip ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-22 23:34 Message: Logged In: YES user_id=33168 Skip, I modified this code some, but your technique is still valid. I got rid of one of the indirections already. The patch can easily be updated. Seems like the patch shouldn't hurt. Tim? ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 19:45 Message: Logged In: YES user_id=44345 Looking for a vote up or down on this one... ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-01-21 08:47 Message: Logged In: YES user_id=44345 Whoops... Make the "observed" speedup 0.1%... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470 From noreply@sourceforge.net Tue Jul 23 09:03:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 01:03:57 -0700 Subject: [Patches] [ python-Patches-552438 ] PyBufferObject fixes Message-ID: Patches item #552438, was opened at 2002-05-05 04:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Out of Date Priority: 5 Submitted By: Scott Gilbert (xscott) Assigned to: Tim Peters (tim_one) Summary: PyBufferObject fixes Initial Comment: This patch fixes these problems: 1) Dangling pointer problem 2) buffer allocated by PyBuffer_New not aligned The PyBufferObject acts differently depending on whether it allocated the memory or if it's borrowing the memory from a PyBufferProcs supporting object. In the case of allocating it's own memory, I made a slight addition that adds some padding so that the ptr is on a sizeof(double) boundary. In the case of borrowing another objects PyBufferProcs memory, PyBufferObject no longer caches the pointer. This might slow things down (probably not by much), but it keeps PyBufferObject from working with a stale pointer. Normally I wouldn't do this, but since this patch touches pretty much every function anyway, I fixed many deviations from the Python coding style. ---------------------------------------------------------------------- >Comment By: Scott Gilbert (xscott) Date: 2002-07-23 08:03 Message: Logged In: YES user_id=38318 On top of the current patch being out of data, in private email, Guido indicated that Tim thinks the code needs more refactoring to simplify it. I'd like to hold off on resubmitting a current patch to see how the bytes object fairs (PEP 296). If the bytes object makes it into the Python core, then probably the best way to simplify and fix the implementation of the buffer object is to reduce it nothing but a "Buffer Inspector" for other objects. (Tearing out the b_ptr field and a lot of if statements at least.) The bytes object could be used to implement the following calls: PyBuffer_FromMemory(...) PyBuffer_FromReadWriteMemory(...) PyBuffer_New(...) In these cases, the bytes object would hold the actual memory, and the buffer object would just be inspecting the bytes object. I'd still stick to the strategy of having the buffer object re-request the pointer before every use (since typically the pointer is only valid while the GIL is held). I haven't figured out how to handle the case when the size specified for the buffer object gets out of whack when the inspected object resizes. Raise an exception? Even with these changes, there would still be some problems in here. For instance, the hash value is easy to invalidate. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 19:41 Message: Logged In: YES user_id=6380 Note, the patch is out of date since somebody fixed some nits with slicing, so I'm marking this as Out Of Date. You might as well upload the new version of the file. :-) Why do you think you need to fix the allocation? Since allocation is done via malloc(), and malloc() guarantees allocation for a double ("for all types"), shouldn't that be enough??? (If it's obmalloc that you're worried about, it's easy to force this to use the real malloc() and free().) I hope Tim will make some time to review this (the "not this week" comment is several months old now). Superficially it looks like a big improvement. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-05-07 18:51 Message: Logged In: YES user_id=31435 Na, assigning a bug is fine by me -- it helps to have *someone* feel guilty . Assigning it doesn't mean it goes to the top of the assignee's heap, though. I can't make time to look at it this week, so it's just as well that it got unassigned. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-07 12:55 Message: Logged In: YES user_id=38318 Apparently assigning a patch is poor form. My bad. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-05 04:27 Message: Logged In: YES user_id=38318 Can I assign this to you or does it take admin privs? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470 From noreply@sourceforge.net Tue Jul 23 09:08:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 01:08:54 -0700 Subject: [Patches] [ python-Patches-550551 ] Read/Write buffers from buffer() Message-ID: Patches item #550551, was opened at 2002-04-30 09:18 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550551&group_id=5470 Category: Core (C code) Group: None >Status: Deleted Resolution: Postponed Priority: 5 Submitted By: Scott Gilbert (xscott) Assigned to: Nobody/Anonymous (nobody) Summary: Read/Write buffers from buffer() Initial Comment: The buffer() builtin does not currently allow the creation of read-write buffers. So there is no way from pure Python code to manipulate objects which support getting a writable pointer via their PyBufferProcs. This patch tries to create a read- write buffer first, and if that fails it will return a read-only buffer object as before. It's tempting to check if the PyBufferProcs has the bf_getwritebuffer pointer and simply return PyBuffer_FromReadWriteObject(...) in this case. This ends up being incorrect for PyStrings since they do have the bf_getwritebuffer pointer, but that always sets an exception. ---------------------------------------------------------------------- >Comment By: Scott Gilbert (xscott) Date: 2002-07-23 08:08 Message: Logged In: YES user_id=38318 The buffer builtin appears to be scheduled for deprecation, so this small patch is not worthwhile. This is independant of creating buffer objects from the C API (as the that does not appear to be deprecated). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-05-07 12:35 Message: Logged In: YES user_id=6380 Please don't assign patches to random developers. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-05 04:29 Message: Logged In: YES user_id=38318 If you take patch 552438, then there shouldn't be anything wrong with this small feature patch... ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-03 08:28 Message: Logged In: YES user_id=38318 This patch should not be accepted until another one fixing a bug in PyBufferObjects is accepter. So please back burner this one until further notice. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550551&group_id=5470 From noreply@sourceforge.net Tue Jul 23 10:59:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 02:59:00 -0700 Subject: [Patches] [ python-Patches-585101 ] Fix relative imports in regression tests Message-ID: Patches item #585101, was opened at 2002-07-22 21:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Barry A. Warsaw (bwarsaw) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: Fix relative imports in regression tests Initial Comment: The regression test suite uses intrapackage relative imports to import stuff like test_support, etc. There's no deep reason for this to be so, since "test" is a standard package. As long as all tests do something like "from test import test_support" or "import test.test_support" everything works fine. Keeping the relative imports makes life more difficult for tests that don't live in the expected location of Lib/test. This patch fixes this by making sure all test imports are absolute. This works fine on *nix, but rumor has it that the Mac tests are run differently so I'd like Jack to comment on whether this patch breaks his test suite or not. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-07-23 11:59 Message: Logged In: YES user_id=45365 I can't test the patch right now, but after visual I can't imagine that it would cause any problems on the mac. Go ahead and check it in, I would say, and I'll complain when it breaks things:-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470 From noreply@sourceforge.net Tue Jul 23 16:50:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 08:50:42 -0700 Subject: [Patches] [ python-Patches-585101 ] Fix relative imports in regression tests Message-ID: Patches item #585101, was opened at 2002-07-22 15:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470 Category: Tests Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Barry A. Warsaw (bwarsaw) Assigned to: Barry A. Warsaw (bwarsaw) Summary: Fix relative imports in regression tests Initial Comment: The regression test suite uses intrapackage relative imports to import stuff like test_support, etc. There's no deep reason for this to be so, since "test" is a standard package. As long as all tests do something like "from test import test_support" or "import test.test_support" everything works fine. Keeping the relative imports makes life more difficult for tests that don't live in the expected location of Lib/test. This patch fixes this by making sure all test imports are absolute. This works fine on *nix, but rumor has it that the Mac tests are run differently so I'd like Jack to comment on whether this patch breaks his test suite or not. ---------------------------------------------------------------------- >Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-07-23 11:50 Message: Logged In: YES user_id=12800 Cool. I'll go ahead and commit these changes and then you and Tim can both beat me up. Guido's at OSCON so he'll have to wait a week to beat me up. :) ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-07-23 05:59 Message: Logged In: YES user_id=45365 I can't test the patch right now, but after visual I can't imagine that it would cause any problems on the mac. Go ahead and check it in, I would say, and I'll complain when it breaks things:-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470 From noreply@sourceforge.net Tue Jul 23 21:43:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 13:43:31 -0700 Subject: [Patches] [ python-Patches-555085 ] timeout socket implementation Message-ID: Patches item #555085, was opened at 2002-05-12 08:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: Accepted Priority: 4 Submitted By: Michael Gilfix (mgilfix) Assigned to: Guido van Rossum (gvanrossum) Summary: timeout socket implementation Initial Comment: This implements bug #457114 and implements timed socket operations. If a timeout is set and the timeout period elaspes before the socket operation has finished, a socket.error exception is thrown. This patch integrates the functionality at two levels: the timeout capability is integrated at the C level in socketmodule.c. Socket.py was also modified to update fileobject creation on a win platform to handle the case of the underlying socket throwing an exception. The tex documentation was also updated and a new regression unit was provided as test_timeout.py. ---------------------------------------------------------------------- >Comment By: Michael Gilfix (mgilfix) Date: 2002-07-23 16:43 Message: Logged In: YES user_id=116038 Now that I'm back :) I checked the archive and this seems to have been handled by you. Please let me know if it isn't resolved and I can give it a closer look. Also, perhaps I should contact Bernie and ask him if there's anything he hasn't gotten around to in the test_timeout that I can off-load from him. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 13:11 Message: Logged In: YES user_id=6380 The default timeout is now implemented in CVS. There's a bug report from Andrew Macintyre (unfortunately on python-dev) about test_socket.py failures on FreeBSD. I'll try to keep an eye on that, so this patch *still* stays open. Also, Bernie has promised some changes that I haven't received yet and the details of which I don't recall (sorry :-( ). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-06-07 21:47 Message: Logged In: YES user_id=6380 Keeping this open as a reminder of things still to finish. Most is in the python-dev discussion; Michael Gilfix and Bernard Yue have offered to produce more patches. One feature we definitely want is a way to specify a timeout to be applied to all new sockets. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-06-06 17:11 Message: Logged In: YES user_id=6380 Thanks for the new version! I've checked this in. I made considerable changes; the following is feedback but you don't need to respond because I've addressed all these in the checked-in code! - Thanks for the cleanup of some non-standard formatting. However, it's better not to do this so the diffs don't show changes that are unrelated to the timeout patch. - You are still importing the select module instead of calling select() directly. I really think you should do the latter -- the select module has an enormous overhead (it allocates several large lists on the heap). - Instead of explicitly testing the argument to settimeout for being a float, int or long, you should simply call PyFloat_AsDouble and handle the error; if someone passes another object that implements __float__ that should be acceptable. - gettimeout() returns sock_timeout without checking if it is NULL. It can be NULL when a socket object is never initialized. E.g. I can do this: >>> from socket import * >>> s = socket.__new__(socket) >>> s.gettimeout() which gives me a segfault. There are probably other places where this is assumed. - I addressed the latter two issues by making sock_timeout a double, whose value is < 0.0 when no timeout is set. ---------------------------------------------------------------------- Comment By: Michael Gilfix (mgilfix) Date: 2002-06-05 18:23 Message: Logged In: YES user_id=116038 I've addressed all the issues brought up by Guido. The 2nd version of the patch is attached here. In this version, I've modified test_socket.py to include tests for the _fileobject class in socket.py that was modified by this patch. _fileobject needed to be modified so that data would not be lost when the underlying socket threw an expection (data was no longer accumulated in local variables). The tests for the _fileobject class succeed on older versions of python (tested 2.1.3) and pass on the newer version of python. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-05-23 16:18 Message: Logged In: YES user_id=6380 For a detailed review, see http://mail.python.org/pipermail/python-dev/2002-May/024340.html ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470 From noreply@sourceforge.net Tue Jul 23 22:43:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 14:43:02 -0700 Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix) Message-ID: Patches item #584245, was opened at 2002-07-20 18:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: get python to link on OSF1 (Dec Unix) Initial Comment: Attached is a patch to fix the linking of python (makedev not found) on Dec OSF/1 Unix 5.1. This patch has also been tested on Linux (RedHat 7.2). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-23 23:43 Message: Logged In: YES user_id=21627 That patch doesn't really test whether defining OSF_SOURCE helps in getting makedev, does it? In particular, if makedev is not available at all, or requires a different define, the test will still conclude that OSF_SOURCE should be defined, right? I think the sequence should be: - is makedev already available? - if not, is it with OSF_SOURCE defined? - if not, arrange to exclude makedev from posixmodule.c Also, is it necessary to run the test program? autoconf is always worried that cross-compilation would fail, since you cannot run tests (although it is reasonable to link test programs in a cross-compilation environment). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470 From noreply@sourceforge.net Tue Jul 23 23:04:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 15:04:29 -0700 Subject: [Patches] [ python-Patches-506436 ] GETCONST/GETNAME/GETNAMEV speedup Message-ID: Patches item #506436, was opened at 2002-01-21 08:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open >Resolution: Out of Date Priority: 5 Submitted By: Skip Montanaro (montanaro) >Assigned to: Skip Montanaro (montanaro) Summary: GETCONST/GETNAME/GETNAMEV speedup Initial Comment: The attached patch redefines the GETCONST, GETNAME & GETNAMEV macros to do the following: * access the code object's consts and names through local variables instead of the long chain from f * use access macros to index the tuples and get the C string names The code appears correct, and I've had no trouble with it. It only provides the most trivial of improvement on pystone (around 1% when I see anything), but it's all those little things that add up, right? Skip ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-23 18:04 Message: Logged In: YES user_id=31435 Marked Out-of-Date and back to Skip. Sorry for the delay! The idea is fine. I'd rather you use the current GETITEM macro, which does bounds-checking in a debug build. I note too that GETCONST is only used once, and that use may as well be a direct GETITEM(consts, i) invocation, and skip the macro. Note that the GETNAME() macro no longer exists. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-22 23:34 Message: Logged In: YES user_id=33168 Skip, I modified this code some, but your technique is still valid. I got rid of one of the indirections already. The patch can easily be updated. Seems like the patch shouldn't hurt. Tim? ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-09 19:45 Message: Logged In: YES user_id=44345 Looking for a vote up or down on this one... ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-01-21 08:47 Message: Logged In: YES user_id=44345 Whoops... Make the "observed" speedup 0.1%... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470 From noreply@sourceforge.net Tue Jul 23 23:03:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 23 Jul 2002 15:03:03 -0700 Subject: [Patches] [ python-Patches-583180 ] smtplib.py patch for macmail esmtp auth Message-ID: Patches item #583180, was opened at 2002-07-18 04:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: bob kuehne (mysticbob) Assigned to: Nobody/Anonymous (nobody) Summary: smtplib.py patch for macmail esmtp auth Initial Comment: i ran into a problem that i've seen several other people describe where they can't authenticate to their particular mail server. i dug into this (my mail server is smtp.mac.com) and discovered that smtplib.py didn't support the specific type of auth that this server required. so, this patch,allows authentication to these specific server types. i also reworked one token to make it a bit more modular. the patch is attached, generated of form: diff smtplib.py_orig smtplib.py_new i'm new to python, and new to the whole patch process on sourceforge, so please let me know what i can do to test, or how else i can work to get this in the next python version. thank you! bob ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-24 00:03 Message: Logged In: YES user_id=21627 On patches: Please always use context (-c) or unified (-u) diffs; those stay valid longer. On AUTH=LOGIN: Can you please try http://sourceforge.net/tracker/index.php? func=detail&aid=572031&group_id=5470&atid=305470 This pre-RFC AUTH protocol is by far not an invention of smtp.mac.com (or specific to it) - it is originally a Netscape invention, and widely implemented. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470 From noreply@sourceforge.net Wed Jul 24 14:05:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jul 2002 06:05:19 -0700 Subject: [Patches] [ python-Patches-572031 ] AUTH method LOGIN for smtplib Message-ID: Patches item #572031, was opened at 2002-06-21 12:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572031&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gerhard Häring (ghaering) Assigned to: Barry A. Warsaw (bwarsaw) Summary: AUTH method LOGIN for smtplib Initial Comment: Unfortunately, my original SMTP auth patch doesn't work so well in real life. There are two methods to advertise the available auth methods for SMTP servers: old-style: AUTH=method1 method2 ... RFC style: AUTH method1 method2 Microsoft's MUAs are b0rken in that they only understand the old-style method. That's why most SMTP servers are configured to advertise their authentication methods in old-style _and_ new style. There are also some especially broken SMTP servers like old M$ Exchange servers that only show their auth methods via the old style. Also the (sadly but true) very widely used M$ Exchange server only supports the LOGIN auth method (I have to use that thing at work, that's why I came up with this patch). Exchange also supports some other proprietary auth methods (NTLM, ...), but we needn't care about these. My argument is that the Python SMTP AUTH support will get a lot more useful to people if we also support 1) the old-style AUTH= advertisement 2) the LOGIN auth method, which, although not standardized via RFCs and originally invented by Netscape, is still in wide use, and for some servers the only method to use them, so we should support it Please note that in the current implementation, if a server uses the old-style AUTH= method, our SMTP auth support simply breaks because of the esmtp_features parsing. I'm randomly assigning this patch to Barry, because AFAIK he knows a lot about email handling. Assign around as you please :-) ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-24 15:05 Message: Logged In: YES user_id=21627 In http://sourceforge.net/tracker/?func=detail&atid=105470&aid=581165&group_id=5470 pierslauder reports success with this patch; see his detailed report for remaining problems. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-17 15:39 Message: Logged In: YES user_id=21627 That existing SMTP servers announce LOGIN only in the old-style header is a good reason to support those as well; I hence recommend that this patch is applied. Microsoft is, strictly speaking, conforming to the RFC by *not* reporting LOGIN in the AUTH header: only registered SASL mechanism can be announced there, and LOGIN is not registered; see http://www.iana.org/assignments/sasl-mechanisms ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-07-01 00:34 Message: Logged In: YES user_id=163326 Updated patch. Changes to the previous patch: - Use email.base64MIME.encode to get rid of the added newlines. - Merge old and RFC-style auth methods in self.smtp_features instead of parsing old-style auth lines seperately. - Removed example line for changing auth method priorities (we won't list all permutations of auth methods ;-) - Removed superfluous logging call of chosen auth method. - Moved comment about SMTP features syntax into the right place again. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-06-30 23:14 Message: Logged In: YES user_id=163326 Martin, the reason why we need to take into account both old and RFC-style auth advertisement is that there are some smtp servers, which advertise different auth mechanisms in the old vs. RFC-style line. In particular, the MS Exchange server that I have to use at work and I think that this is even the default configuration of Exchange 2000. In my case, it advertises its LOGIN method only in the AUTH= line. I'll shortly upload a patch that takes this into account. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-30 18:20 Message: Logged In: YES user_id=21627 I still cannot see why support for the old-style AUTH lines is necessary. If all SMTPds announce their supported mechanisms with both syntaxes, why is it then necessary to even look at the old syntax? I'm all for adding support for the LOGIN method. ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2002-06-30 17:59 Message: Logged In: YES user_id=12800 Martin, (some? most?) MUAs post messages by talking directly to their outgoing SMTPd, so that's probably why Gerhard mentions it. On the issue of base64 issue, see the comment in bug #552605, which I just took assignment of. I'll deal with both these bug reports soon. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-30 17:41 Message: Logged In: YES user_id=21627 I cannot understand why the behaviour of MS MUAs is relevant here at all; smtplib only talks to MTAs (or MSAs). If MTAs advertise the AUTH extension in the new syntax in addition to the old syntax, why is it not good to just ignore the old advertisement? Can you point to a specific software package (ideally even a specific host) which fails to interact with the current smtplib correctly? ---------------------------------------------------------------------- Comment By: Jason R. Mastaler (jasonrm) Date: 2002-06-22 05:53 Message: Logged In: YES user_id=85984 A comment on the old-style advertisement. You say that Microsoft's MUAs only understand the old-style method. I haven't found this to be the case. tmda-ofmipd is an outgoing SMTP proxy that supports SMTP authentication, and I only use the RFC style advertisement. This works perfectly well with MS clients like Outlook 2000, and Outlook Express 5. Below is an example of what the advertisement looks like. BTW, no disagreement about supporting the old-style advertisement in smtplib, as I think it's prudent, just making a point. # telnet aguirre 8025 Trying 172.18.3.5... Connected to aguirre.la.mastaler.com. Escape character is '^]'. 220 aguirre.la.mastaler.com ESMTP tmda-ofmipd EHLO aguirre.la.mastaler.com 250-aguirre.la.mastaler.com 250 AUTH LOGIN CRAM-MD5 PLAIN QUIT 221 Bye Connection closed by foreign host. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-06-21 12:43 Message: Logged In: YES user_id=163326 This also includes a slightly modified version of patch #552605. Even better would IMO be to add an additional parameter to base64.encode* and the corresponding binascii functions that avoids the insertion of newline characters. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572031&group_id=5470 From noreply@sourceforge.net Wed Jul 24 14:27:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jul 2002 06:27:49 -0700 Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py Message-ID: Patches item #585913, was opened at 2002-07-24 08:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Greg Copeland (oracle) Assigned to: Nobody/Anonymous (nobody) Summary: Adds Galeon support to webbrowser.py Initial Comment: Simple context diff against current CVS tree to add support for Galeon to webbrowser.py ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 From noreply@sourceforge.net Wed Jul 24 14:29:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jul 2002 06:29:06 -0700 Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py Message-ID: Patches item #585913, was opened at 2002-07-24 08:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 Category: Library (Lib) >Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Copeland (oracle) Assigned to: Nobody/Anonymous (nobody) Summary: Adds Galeon support to webbrowser.py Initial Comment: Simple context diff against current CVS tree to add support for Galeon to webbrowser.py ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 From noreply@sourceforge.net Wed Jul 24 19:55:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jul 2002 11:55:39 -0700 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 6 Submitted By: Walter Dörwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aäoöuüß".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2002-07-24 20:55 Message: Logged In: YES user_id=89016 diff12.txt finally implements the PEP293 specification (i.e. using exceptions for the communication between codec and handler) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-30 18:30 Message: Logged In: YES user_id=89016 diff11.txt fixes two refcounting bugs in codecs.c. speedtest.py is a little test script, that checks to speed of various string/encoding/error combinations. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-29 22:50 Message: Logged In: YES user_id=89016 This new version diff10.txt fixes a memory overwrite/reallocation bug in PyUnicode_EncodeCharmap and moves the error handling out of PyUnicode_EncodeCharmap. A new version of the test script is included too. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-16 21:06 Message: Logged In: YES user_id=89016 OK, PyUnicode_TranslateCharmap is finished too. As the errors argument is again not exposed to Python it can't really be tested. Should we add errors as an optional argument to unicode.translate? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-01 19:57 Message: Logged In: YES user_id=89016 OK, PyUnicode_EncodeDecimal is done (diff8.txt), but as the errors argument can't be accessed from Python code, there's not much testing for this. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-20 17:34 Message: Logged In: YES user_id=89016 A new idea for the interface between the codec and the callback: Maybe we could have new exception classes UnicodeEncodeError, UnicodeDecodeError and UnicodeTranslateError derived from UnicodeError. They have all the attributes that are passed as an argument tuple in the current version: string: the original string start: the start position of the unencodable characters/undecodable bytes end: the end position+1 of the unencodable characters/undecodable bytes. reason: the a string, that explains, why the encoding/decoding doesn't work. There is no data object, because when a codec wants to pass extended information to the callback it can do this via a derived class. It might be better to move these attributes to the base class UnicodeError, but this might have backwards compatibility problems. With this method we really can have one global registry for all callbacks, because for callback names that must work with encoding *and* decoding *and* translating (i.e. "strict", "replace" and "ignore"), the callback can check which type of exception was passed, so "replace" can e.g. look like this: def replace(exc): if isinstance(exc, UnicodeDecodeError): return ("?", exc.end) else: return (u"?"*(exc.end-exc.start), exc.end) Another possibility would be to do the commucation callback->codec by assigning to attributes of the exception object. The resyncronisation position could even be preassigned to end, so the callback only needs to specify the replacement in most cases: def replace(exc): if isinstance(exc, UnicodeDecodeError): exc.replacement = "?" else: exc.replacement = u"?"*(exc.end-exc.start) As many of the assignments can now be done on the C level without having to allocate Python objects (except for the replacement string and the reason), this version might even be faster, especially if we allow the codec to reuse the exception object for the next call to the callback. Does this make sense, or is this to fancy? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-18 21:24 Message: Logged In: YES user_id=89016 And here is the test script (test_codeccallbacks.py) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-18 21:22 Message: Logged In: YES user_id=89016 OK, here is the current version of the patch (diff7.txt). PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are still missing. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-17 22:50 Message: Logged In: YES user_id=89016 > About the difference between encoding > and decoding: you shouldn't just look > at the case where you work with Unicode > and strings, e.g. take the rot-13 codec > which works on strings only or other > codecs which translate objects into > strings and vice-versa. unicode.encode encodes to str and str.decode decodes to unicode, even for rot-13: >>> u"gürk".encode("rot13") 't\xfcex' >>> "gürk".decode("rot13") u't\xfcex' >>> u"gürk".decode("rot13") Traceback (most recent call last): File "", line 1, in ? AttributeError: 'unicode' object has no attribute 'decode' >>> "gürk".encode("rot13") Traceback (most recent call last): File "", line 1, in ? File "/home/walter/Python-current- readonly/dist/src/Lib/encodings/rot_13.py", line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeError: ASCII decoding error: ordinal not in range (128) Here the str is converted to unicode first, before encode is called, but the conversion to unicode fails. Is there an example where something else happens? > Error handling has to be flexible enough > to handle all these situations. Since > the codecs know best how to handle the > situations, I'd make this an implementation > detail of the codec and leave the > behaviour undefined in the general case. OK, but we should suggest, that for encoding unencodable characters are collected and for decoding seperate byte sequences that are considered broken by the codec are passed to the callback: i.e for decoding the handler will never get all broken data in one call, e.g. for "\u30\Uffffffff".decode("unicode-escape") the handler will be called twice (once for "\u30" and "truncated \u escape" as the reason and once for "\Uffffffff" and "illegal character" as the reason.) > For the existing codecs, backward > compatibility should be maintained, > if at all possible. If the patch gets > overly complicated because of this, > we may have to provide a downgrade solution > for this particular problem (I don't think > replace is used in any computational context, > though, since you can never be sure how > many replacement character do get > inserted, so the case may not be > that realistic). > > Raising an exception for the charmap codec > is the right way to go, IMHO. I would > consider the current behaviour a bug. OK, this is implemented in PyUnicode_EncodeCharmap now, and collecting unencodable characters works too. I completely changed the implementation, because the stack approach would have gotten much more complicated when unencodable characters are collected. > For new codecs, I think we should > suggest that replace tries to collect > as much illegal data as possible before > invoking the error handler. The handler > should be aware of the fact that it > won't necessarily get all the broken > data in one call. OK for encoders, for decoders see above. > About the codec error handling > registry: You seem to be using a > Unicode specific approach here. > I'd rather like to see a generic > approach which uses the API > we discussed earlier. Would that be possible? The handlers in the registry are all Unicode specific. and they are different for encoding and for decoding. I renamed the function because of your comment from 2001-06-13 10:05 (which becomes exceedingly difficult to find on this long page! ;)). > In that case, the codec API should > probably be called > codecs.register_error('myhandler', myhandler). > > Does that make sense ? We could require that unique names are used for custom handlers, but for the standard handlers we do have name collisions. To prevent them, we could either remove them from the registry and require that the codec implements the error handling for those itself, or we could to some fiddling, so that u"üöä".encode("ascii", "replace") becomes u"üöä".encode("ascii", "unicodeencodereplace") behind the scenes. But I think two unicode specific registries are much simpler to handle. > BTW, the patch which uses the callback > registry does not seem to be available > on this SF page (the last patch still > converts the errors argument to a > PyObject, which shouldn't be needed > anymore with the new approach). > Can you please upload your > latest version? OK, I'll upload a preliminary version tomorrow. PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are still missing, but otherwise the patch seems to be finished. All decoders work and the encoders collect unencodable characters and implement the handling of known callback handler names themselves. As PyUnicode_EncodeDecimal is only used by the int, long, float, and complex constructors, I'd love to get rid of the errors argument, but for completeness sake, I'll implement the callback functionality. > Note that the highlighting codec > would make a nice example > for the new feature. This could be part of the codec callback test script, which I've started to write. We could kill two birds with one stone here: 1. Test the implementation. 2. Document and advocate what is possible with the patch. Another idea: we could have as an example a decoding handler that relaxes the UTF-8 minimal encoding restriction, e.g. def relaxedutf8(enc, uni, startpos, endpos, reason, data): if uni[startpos:startpos+2] == u"\xc0\x80": return (u"\x00", startpos+2) else: raise UnicodeError(...) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-04-17 21:40 Message: Logged In: YES user_id=38388 Sorry for the late response. About the difference between encoding and decoding: you shouldn't just look at the case where you work with Unicode and strings, e.g. take the rot-13 codec which works on strings only or other codecs which translate objects into strings and vice-versa. Error handling has to be flexible enough to handle all these situations. Since the codecs know best how to handle the situations, I'd make this an implementation detail of the codec and leave the behaviour undefined in the general case. For the existing codecs, backward compatibility should be maintained, if at all possible. If the patch gets overly complicated because of this, we may have to provide a downgrade solution for this particular problem (I don't think replace is used in any computational context, though, since you can never be sure how many replacement character do get inserted, so the case may not be that realistic). Raising an exception for the charmap codec is the right way to go, IMHO. I would consider the current behaviour a bug. For new codecs, I think we should suggest that replace tries to collect as much illegal data as possible before invoking the error handler. The handler should be aware of the fact that it won't necessarily get all the broken data in one call. About the codec error handling registry: You seem to be using a Unicode specific approach here. I'd rather like to see a generic approach which uses the API we discussed earlier. Would that be possible ? In that case, the codec API should probably be called codecs.register_error('myhandler', myhandler). Does that make sense ? BTW, the patch which uses the callback registry does not seem to be available on this SF page (the last patch still converts the errors argument to a PyObject, which shouldn't be needed anymore with the new approach). Can you please upload your latest version ? Note that the highlighting codec would make a nice example for the new feature. Thanks. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-17 12:21 Message: Logged In: YES user_id=89016 Another note: the patch will change the meaning of charmap encoding slightly: currently "replace" will put a ? into the output, even if ? is not in the mapping, i.e. codecs.charmap_encode(u"c", "replace", {ord("a"): ord ("b")}) will return ('?', 1). With the patch the above example will raise an exception. Off course with the patch many more replace characters can appear, so it is vital that for the replacement string the mapping is done. Is this semantic change OK? (I guess all of the existing codecs have a mapping ord("?")->ord("?")) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-15 18:19 Message: Logged In: YES user_id=89016 So this means that the encoder can collect illegal characters and pass it to the callback. "replace" will replace this with (end-start)*u"?". Decoders don't collect all illegal byte sequences, but call the callback once for every byte sequence that has been found illegal and "replace" will replace it with u"?". Does this make sense? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-15 18:06 Message: Logged In: YES user_id=89016 For encoding it's always (end-start)*u"?": >>> u"ää".encode("ascii", "replace") '??' But for decoding, it is neither nor: >>> "\Ux\U".decode("unicode-escape", "replace") u'\ufffd\ufffd' i.e. a sequence of 5 illegal characters was replace by two replacement characters. This might mean that decoders can't collect all the illegal characters and call the callback once. They might have to call the callback for every single illegal byte sequence to get the old behaviour. (It seems that this patch would be much, much simpler, if we only change the encoders) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 19:36 Message: Logged In: YES user_id=38388 Hmm, whatever it takes to maintain backwards compatibility. Do you have an example ? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-08 18:31 Message: Logged In: YES user_id=89016 What should replace do: Return u"?" or (end-start)*u"?" ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 16:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aäüöo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gürk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"äöü".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gürk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"äöü".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gürk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gürk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Wed Jul 24 20:04:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jul 2002 12:04:45 -0700 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 6 Submitted By: Walter Dörwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aäoöuüß".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2002-07-24 21:04 Message: Logged In: YES user_id=89016 Attached is a new version of the test script. But we need more tests. UTF-7 is completely untested and using codecs that pass wrong arguments to the handler and handler that return wrong or out of bounds results is untested too. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-07-24 20:55 Message: Logged In: YES user_id=89016 diff12.txt finally implements the PEP293 specification (i.e. using exceptions for the communication between codec and handler) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-30 18:30 Message: Logged In: YES user_id=89016 diff11.txt fixes two refcounting bugs in codecs.c. speedtest.py is a little test script, that checks to speed of various string/encoding/error combinations. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-29 22:50 Message: Logged In: YES user_id=89016 This new version diff10.txt fixes a memory overwrite/reallocation bug in PyUnicode_EncodeCharmap and moves the error handling out of PyUnicode_EncodeCharmap. A new version of the test script is included too. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-16 21:06 Message: Logged In: YES user_id=89016 OK, PyUnicode_TranslateCharmap is finished too. As the errors argument is again not exposed to Python it can't really be tested. Should we add errors as an optional argument to unicode.translate? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-01 19:57 Message: Logged In: YES user_id=89016 OK, PyUnicode_EncodeDecimal is done (diff8.txt), but as the errors argument can't be accessed from Python code, there's not much testing for this. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-20 17:34 Message: Logged In: YES user_id=89016 A new idea for the interface between the codec and the callback: Maybe we could have new exception classes UnicodeEncodeError, UnicodeDecodeError and UnicodeTranslateError derived from UnicodeError. They have all the attributes that are passed as an argument tuple in the current version: string: the original string start: the start position of the unencodable characters/undecodable bytes end: the end position+1 of the unencodable characters/undecodable bytes. reason: the a string, that explains, why the encoding/decoding doesn't work. There is no data object, because when a codec wants to pass extended information to the callback it can do this via a derived class. It might be better to move these attributes to the base class UnicodeError, but this might have backwards compatibility problems. With this method we really can have one global registry for all callbacks, because for callback names that must work with encoding *and* decoding *and* translating (i.e. "strict", "replace" and "ignore"), the callback can check which type of exception was passed, so "replace" can e.g. look like this: def replace(exc): if isinstance(exc, UnicodeDecodeError): return ("?", exc.end) else: return (u"?"*(exc.end-exc.start), exc.end) Another possibility would be to do the commucation callback->codec by assigning to attributes of the exception object. The resyncronisation position could even be preassigned to end, so the callback only needs to specify the replacement in most cases: def replace(exc): if isinstance(exc, UnicodeDecodeError): exc.replacement = "?" else: exc.replacement = u"?"*(exc.end-exc.start) As many of the assignments can now be done on the C level without having to allocate Python objects (except for the replacement string and the reason), this version might even be faster, especially if we allow the codec to reuse the exception object for the next call to the callback. Does this make sense, or is this to fancy? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-18 21:24 Message: Logged In: YES user_id=89016 And here is the test script (test_codeccallbacks.py) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-18 21:22 Message: Logged In: YES user_id=89016 OK, here is the current version of the patch (diff7.txt). PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are still missing. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-17 22:50 Message: Logged In: YES user_id=89016 > About the difference between encoding > and decoding: you shouldn't just look > at the case where you work with Unicode > and strings, e.g. take the rot-13 codec > which works on strings only or other > codecs which translate objects into > strings and vice-versa. unicode.encode encodes to str and str.decode decodes to unicode, even for rot-13: >>> u"gürk".encode("rot13") 't\xfcex' >>> "gürk".decode("rot13") u't\xfcex' >>> u"gürk".decode("rot13") Traceback (most recent call last): File "", line 1, in ? AttributeError: 'unicode' object has no attribute 'decode' >>> "gürk".encode("rot13") Traceback (most recent call last): File "", line 1, in ? File "/home/walter/Python-current- readonly/dist/src/Lib/encodings/rot_13.py", line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeError: ASCII decoding error: ordinal not in range (128) Here the str is converted to unicode first, before encode is called, but the conversion to unicode fails. Is there an example where something else happens? > Error handling has to be flexible enough > to handle all these situations. Since > the codecs know best how to handle the > situations, I'd make this an implementation > detail of the codec and leave the > behaviour undefined in the general case. OK, but we should suggest, that for encoding unencodable characters are collected and for decoding seperate byte sequences that are considered broken by the codec are passed to the callback: i.e for decoding the handler will never get all broken data in one call, e.g. for "\u30\Uffffffff".decode("unicode-escape") the handler will be called twice (once for "\u30" and "truncated \u escape" as the reason and once for "\Uffffffff" and "illegal character" as the reason.) > For the existing codecs, backward > compatibility should be maintained, > if at all possible. If the patch gets > overly complicated because of this, > we may have to provide a downgrade solution > for this particular problem (I don't think > replace is used in any computational context, > though, since you can never be sure how > many replacement character do get > inserted, so the case may not be > that realistic). > > Raising an exception for the charmap codec > is the right way to go, IMHO. I would > consider the current behaviour a bug. OK, this is implemented in PyUnicode_EncodeCharmap now, and collecting unencodable characters works too. I completely changed the implementation, because the stack approach would have gotten much more complicated when unencodable characters are collected. > For new codecs, I think we should > suggest that replace tries to collect > as much illegal data as possible before > invoking the error handler. The handler > should be aware of the fact that it > won't necessarily get all the broken > data in one call. OK for encoders, for decoders see above. > About the codec error handling > registry: You seem to be using a > Unicode specific approach here. > I'd rather like to see a generic > approach which uses the API > we discussed earlier. Would that be possible? The handlers in the registry are all Unicode specific. and they are different for encoding and for decoding. I renamed the function because of your comment from 2001-06-13 10:05 (which becomes exceedingly difficult to find on this long page! ;)). > In that case, the codec API should > probably be called > codecs.register_error('myhandler', myhandler). > > Does that make sense ? We could require that unique names are used for custom handlers, but for the standard handlers we do have name collisions. To prevent them, we could either remove them from the registry and require that the codec implements the error handling for those itself, or we could to some fiddling, so that u"üöä".encode("ascii", "replace") becomes u"üöä".encode("ascii", "unicodeencodereplace") behind the scenes. But I think two unicode specific registries are much simpler to handle. > BTW, the patch which uses the callback > registry does not seem to be available > on this SF page (the last patch still > converts the errors argument to a > PyObject, which shouldn't be needed > anymore with the new approach). > Can you please upload your > latest version? OK, I'll upload a preliminary version tomorrow. PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are still missing, but otherwise the patch seems to be finished. All decoders work and the encoders collect unencodable characters and implement the handling of known callback handler names themselves. As PyUnicode_EncodeDecimal is only used by the int, long, float, and complex constructors, I'd love to get rid of the errors argument, but for completeness sake, I'll implement the callback functionality. > Note that the highlighting codec > would make a nice example > for the new feature. This could be part of the codec callback test script, which I've started to write. We could kill two birds with one stone here: 1. Test the implementation. 2. Document and advocate what is possible with the patch. Another idea: we could have as an example a decoding handler that relaxes the UTF-8 minimal encoding restriction, e.g. def relaxedutf8(enc, uni, startpos, endpos, reason, data): if uni[startpos:startpos+2] == u"\xc0\x80": return (u"\x00", startpos+2) else: raise UnicodeError(...) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-04-17 21:40 Message: Logged In: YES user_id=38388 Sorry for the late response. About the difference between encoding and decoding: you shouldn't just look at the case where you work with Unicode and strings, e.g. take the rot-13 codec which works on strings only or other codecs which translate objects into strings and vice-versa. Error handling has to be flexible enough to handle all these situations. Since the codecs know best how to handle the situations, I'd make this an implementation detail of the codec and leave the behaviour undefined in the general case. For the existing codecs, backward compatibility should be maintained, if at all possible. If the patch gets overly complicated because of this, we may have to provide a downgrade solution for this particular problem (I don't think replace is used in any computational context, though, since you can never be sure how many replacement character do get inserted, so the case may not be that realistic). Raising an exception for the charmap codec is the right way to go, IMHO. I would consider the current behaviour a bug. For new codecs, I think we should suggest that replace tries to collect as much illegal data as possible before invoking the error handler. The handler should be aware of the fact that it won't necessarily get all the broken data in one call. About the codec error handling registry: You seem to be using a Unicode specific approach here. I'd rather like to see a generic approach which uses the API we discussed earlier. Would that be possible ? In that case, the codec API should probably be called codecs.register_error('myhandler', myhandler). Does that make sense ? BTW, the patch which uses the callback registry does not seem to be available on this SF page (the last patch still converts the errors argument to a PyObject, which shouldn't be needed anymore with the new approach). Can you please upload your latest version ? Note that the highlighting codec would make a nice example for the new feature. Thanks. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-17 12:21 Message: Logged In: YES user_id=89016 Another note: the patch will change the meaning of charmap encoding slightly: currently "replace" will put a ? into the output, even if ? is not in the mapping, i.e. codecs.charmap_encode(u"c", "replace", {ord("a"): ord ("b")}) will return ('?', 1). With the patch the above example will raise an exception. Off course with the patch many more replace characters can appear, so it is vital that for the replacement string the mapping is done. Is this semantic change OK? (I guess all of the existing codecs have a mapping ord("?")->ord("?")) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-15 18:19 Message: Logged In: YES user_id=89016 So this means that the encoder can collect illegal characters and pass it to the callback. "replace" will replace this with (end-start)*u"?". Decoders don't collect all illegal byte sequences, but call the callback once for every byte sequence that has been found illegal and "replace" will replace it with u"?". Does this make sense? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-15 18:06 Message: Logged In: YES user_id=89016 For encoding it's always (end-start)*u"?": >>> u"ää".encode("ascii", "replace") '??' But for decoding, it is neither nor: >>> "\Ux\U".decode("unicode-escape", "replace") u'\ufffd\ufffd' i.e. a sequence of 5 illegal characters was replace by two replacement characters. This might mean that decoders can't collect all the illegal characters and call the callback once. They might have to call the callback for every single illegal byte sequence to get the old behaviour. (It seems that this patch would be much, much simpler, if we only change the encoders) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 19:36 Message: Logged In: YES user_id=38388 Hmm, whatever it takes to maintain backwards compatibility. Do you have an example ? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-08 18:31 Message: Logged In: YES user_id=89016 What should replace do: Return u"?" or (end-start)*u"?" ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 16:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aäüöo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gürk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"äöü".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gürk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"äöü".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gürk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gürk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Wed Jul 24 21:36:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 24 Jul 2002 13:36:41 -0700 Subject: [Patches] [ python-Patches-552438 ] PyBufferObject fixes Message-ID: Patches item #552438, was opened at 2002-05-05 00:26 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470 Category: Core (C code) Group: None Status: Open >Resolution: Postponed Priority: 5 Submitted By: Scott Gilbert (xscott) >Assigned to: Nobody/Anonymous (nobody) Summary: PyBufferObject fixes Initial Comment: This patch fixes these problems: 1) Dangling pointer problem 2) buffer allocated by PyBuffer_New not aligned The PyBufferObject acts differently depending on whether it allocated the memory or if it's borrowing the memory from a PyBufferProcs supporting object. In the case of allocating it's own memory, I made a slight addition that adds some padding so that the ptr is on a sizeof(double) boundary. In the case of borrowing another objects PyBufferProcs memory, PyBufferObject no longer caches the pointer. This might slow things down (probably not by much), but it keeps PyBufferObject from working with a stale pointer. Normally I wouldn't do this, but since this patch touches pretty much every function anyway, I fixed many deviations from the Python coding style. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-24 16:36 Message: Logged In: YES user_id=31435 Since Scott is on to something else, marked this Postponed and unassigned it. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-07-23 04:03 Message: Logged In: YES user_id=38318 On top of the current patch being out of data, in private email, Guido indicated that Tim thinks the code needs more refactoring to simplify it. I'd like to hold off on resubmitting a current patch to see how the bytes object fairs (PEP 296). If the bytes object makes it into the Python core, then probably the best way to simplify and fix the implementation of the buffer object is to reduce it nothing but a "Buffer Inspector" for other objects. (Tearing out the b_ptr field and a lot of if statements at least.) The bytes object could be used to implement the following calls: PyBuffer_FromMemory(...) PyBuffer_FromReadWriteMemory(...) PyBuffer_New(...) In these cases, the bytes object would hold the actual memory, and the buffer object would just be inspecting the bytes object. I'd still stick to the strategy of having the buffer object re-request the pointer before every use (since typically the pointer is only valid while the GIL is held). I haven't figured out how to handle the case when the size specified for the buffer object gets out of whack when the inspected object resizes. Raise an exception? Even with these changes, there would still be some problems in here. For instance, the hash value is easy to invalidate. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-18 15:41 Message: Logged In: YES user_id=6380 Note, the patch is out of date since somebody fixed some nits with slicing, so I'm marking this as Out Of Date. You might as well upload the new version of the file. :-) Why do you think you need to fix the allocation? Since allocation is done via malloc(), and malloc() guarantees allocation for a double ("for all types"), shouldn't that be enough??? (If it's obmalloc that you're worried about, it's easy to force this to use the real malloc() and free().) I hope Tim will make some time to review this (the "not this week" comment is several months old now). Superficially it looks like a big improvement. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-05-07 14:51 Message: Logged In: YES user_id=31435 Na, assigning a bug is fine by me -- it helps to have *someone* feel guilty . Assigning it doesn't mean it goes to the top of the assignee's heap, though. I can't make time to look at it this week, so it's just as well that it got unassigned. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-07 08:55 Message: Logged In: YES user_id=38318 Apparently assigning a patch is poor form. My bad. ---------------------------------------------------------------------- Comment By: Scott Gilbert (xscott) Date: 2002-05-05 00:27 Message: Logged In: YES user_id=38318 Can I assign this to you or does it take admin privs? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470 From noreply@sourceforge.net Thu Jul 25 13:05:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 25 Jul 2002 05:05:23 -0700 Subject: [Patches] [ python-Patches-586437 ] galeon support in webbrowser Message-ID: Patches item #586437, was opened at 2002-07-25 17:35 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Supreet Sethi (supreet) Assigned to: Nobody/Anonymous (nobody) Summary: galeon support in webbrowser Initial Comment: adds galeon support to webbrowser.py ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470 From noreply@sourceforge.net Thu Jul 25 17:21:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 25 Jul 2002 09:21:33 -0700 Subject: [Patches] [ python-Patches-586561 ] Better token-related error messages Message-ID: Patches item #586561, was opened at 2002-07-25 11:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586561&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Jeremy Hylton (jhylton) Summary: Better token-related error messages Initial Comment: There were some complaints recently on c.l.py about the rather non-informative error messages emitted as a result of the tokenizer detecting a problem. In many situations it simply returns E_TOKEN which generates a fairly benign, but often unhelpful "invalid token" message. This patch adds several new E_* macrosto Includes/errorcode.h, returns them from the appropriate places in Parser/tokenizer.c and generates more specific messages in Python/pythonrun.c. I think the error messages are always better, though in some situations they may still not be strictly correct. Assigning to Jeremy since he's the compiler wiz. Skip ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586561&group_id=5470 From noreply@sourceforge.net Fri Jul 26 16:51:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 08:51:08 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 16:41:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 08:41:07 -0700 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 6 Submitted By: Walter Dörwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aäoöuüß".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2002-07-26 17:41 Message: Logged In: YES user_id=89016 The attached new version of the test script add test for wrong parameter passed to the callbacks or wrong results returned from the callback. It also add tests to the long string tests for copies of the builtin error handlers, so the codec does not recognize the name and goes through the general callback machinery. UTF-7 decoding still has a flaw inherited from the current implementation: >>> "+xxx".decode("utf-7") Traceback (most recent call last): File "", line 1, in ? UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-3: unterminated shift sequence *>>> "+xxx".decode("utf-7", "ignore") u'\uc71c' The decoder should consider the whole sequence "+xxx" as undecodable, so "Ignore" should return an empty string. Currently the correct sequence will be passed to the callback, but the faulty sequence has already been emitted to the result string. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-07-24 21:04 Message: Logged In: YES user_id=89016 Attached is a new version of the test script. But we need more tests. UTF-7 is completely untested and using codecs that pass wrong arguments to the handler and handler that return wrong or out of bounds results is untested too. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-07-24 20:55 Message: Logged In: YES user_id=89016 diff12.txt finally implements the PEP293 specification (i.e. using exceptions for the communication between codec and handler) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-30 18:30 Message: Logged In: YES user_id=89016 diff11.txt fixes two refcounting bugs in codecs.c. speedtest.py is a little test script, that checks to speed of various string/encoding/error combinations. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-29 22:50 Message: Logged In: YES user_id=89016 This new version diff10.txt fixes a memory overwrite/reallocation bug in PyUnicode_EncodeCharmap and moves the error handling out of PyUnicode_EncodeCharmap. A new version of the test script is included too. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-16 21:06 Message: Logged In: YES user_id=89016 OK, PyUnicode_TranslateCharmap is finished too. As the errors argument is again not exposed to Python it can't really be tested. Should we add errors as an optional argument to unicode.translate? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-05-01 19:57 Message: Logged In: YES user_id=89016 OK, PyUnicode_EncodeDecimal is done (diff8.txt), but as the errors argument can't be accessed from Python code, there's not much testing for this. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-20 17:34 Message: Logged In: YES user_id=89016 A new idea for the interface between the codec and the callback: Maybe we could have new exception classes UnicodeEncodeError, UnicodeDecodeError and UnicodeTranslateError derived from UnicodeError. They have all the attributes that are passed as an argument tuple in the current version: string: the original string start: the start position of the unencodable characters/undecodable bytes end: the end position+1 of the unencodable characters/undecodable bytes. reason: the a string, that explains, why the encoding/decoding doesn't work. There is no data object, because when a codec wants to pass extended information to the callback it can do this via a derived class. It might be better to move these attributes to the base class UnicodeError, but this might have backwards compatibility problems. With this method we really can have one global registry for all callbacks, because for callback names that must work with encoding *and* decoding *and* translating (i.e. "strict", "replace" and "ignore"), the callback can check which type of exception was passed, so "replace" can e.g. look like this: def replace(exc): if isinstance(exc, UnicodeDecodeError): return ("?", exc.end) else: return (u"?"*(exc.end-exc.start), exc.end) Another possibility would be to do the commucation callback->codec by assigning to attributes of the exception object. The resyncronisation position could even be preassigned to end, so the callback only needs to specify the replacement in most cases: def replace(exc): if isinstance(exc, UnicodeDecodeError): exc.replacement = "?" else: exc.replacement = u"?"*(exc.end-exc.start) As many of the assignments can now be done on the C level without having to allocate Python objects (except for the replacement string and the reason), this version might even be faster, especially if we allow the codec to reuse the exception object for the next call to the callback. Does this make sense, or is this to fancy? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-18 21:24 Message: Logged In: YES user_id=89016 And here is the test script (test_codeccallbacks.py) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-18 21:22 Message: Logged In: YES user_id=89016 OK, here is the current version of the patch (diff7.txt). PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are still missing. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-17 22:50 Message: Logged In: YES user_id=89016 > About the difference between encoding > and decoding: you shouldn't just look > at the case where you work with Unicode > and strings, e.g. take the rot-13 codec > which works on strings only or other > codecs which translate objects into > strings and vice-versa. unicode.encode encodes to str and str.decode decodes to unicode, even for rot-13: >>> u"gürk".encode("rot13") 't\xfcex' >>> "gürk".decode("rot13") u't\xfcex' >>> u"gürk".decode("rot13") Traceback (most recent call last): File "", line 1, in ? AttributeError: 'unicode' object has no attribute 'decode' >>> "gürk".encode("rot13") Traceback (most recent call last): File "", line 1, in ? File "/home/walter/Python-current- readonly/dist/src/Lib/encodings/rot_13.py", line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeError: ASCII decoding error: ordinal not in range (128) Here the str is converted to unicode first, before encode is called, but the conversion to unicode fails. Is there an example where something else happens? > Error handling has to be flexible enough > to handle all these situations. Since > the codecs know best how to handle the > situations, I'd make this an implementation > detail of the codec and leave the > behaviour undefined in the general case. OK, but we should suggest, that for encoding unencodable characters are collected and for decoding seperate byte sequences that are considered broken by the codec are passed to the callback: i.e for decoding the handler will never get all broken data in one call, e.g. for "\u30\Uffffffff".decode("unicode-escape") the handler will be called twice (once for "\u30" and "truncated \u escape" as the reason and once for "\Uffffffff" and "illegal character" as the reason.) > For the existing codecs, backward > compatibility should be maintained, > if at all possible. If the patch gets > overly complicated because of this, > we may have to provide a downgrade solution > for this particular problem (I don't think > replace is used in any computational context, > though, since you can never be sure how > many replacement character do get > inserted, so the case may not be > that realistic). > > Raising an exception for the charmap codec > is the right way to go, IMHO. I would > consider the current behaviour a bug. OK, this is implemented in PyUnicode_EncodeCharmap now, and collecting unencodable characters works too. I completely changed the implementation, because the stack approach would have gotten much more complicated when unencodable characters are collected. > For new codecs, I think we should > suggest that replace tries to collect > as much illegal data as possible before > invoking the error handler. The handler > should be aware of the fact that it > won't necessarily get all the broken > data in one call. OK for encoders, for decoders see above. > About the codec error handling > registry: You seem to be using a > Unicode specific approach here. > I'd rather like to see a generic > approach which uses the API > we discussed earlier. Would that be possible? The handlers in the registry are all Unicode specific. and they are different for encoding and for decoding. I renamed the function because of your comment from 2001-06-13 10:05 (which becomes exceedingly difficult to find on this long page! ;)). > In that case, the codec API should > probably be called > codecs.register_error('myhandler', myhandler). > > Does that make sense ? We could require that unique names are used for custom handlers, but for the standard handlers we do have name collisions. To prevent them, we could either remove them from the registry and require that the codec implements the error handling for those itself, or we could to some fiddling, so that u"üöä".encode("ascii", "replace") becomes u"üöä".encode("ascii", "unicodeencodereplace") behind the scenes. But I think two unicode specific registries are much simpler to handle. > BTW, the patch which uses the callback > registry does not seem to be available > on this SF page (the last patch still > converts the errors argument to a > PyObject, which shouldn't be needed > anymore with the new approach). > Can you please upload your > latest version? OK, I'll upload a preliminary version tomorrow. PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are still missing, but otherwise the patch seems to be finished. All decoders work and the encoders collect unencodable characters and implement the handling of known callback handler names themselves. As PyUnicode_EncodeDecimal is only used by the int, long, float, and complex constructors, I'd love to get rid of the errors argument, but for completeness sake, I'll implement the callback functionality. > Note that the highlighting codec > would make a nice example > for the new feature. This could be part of the codec callback test script, which I've started to write. We could kill two birds with one stone here: 1. Test the implementation. 2. Document and advocate what is possible with the patch. Another idea: we could have as an example a decoding handler that relaxes the UTF-8 minimal encoding restriction, e.g. def relaxedutf8(enc, uni, startpos, endpos, reason, data): if uni[startpos:startpos+2] == u"\xc0\x80": return (u"\x00", startpos+2) else: raise UnicodeError(...) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-04-17 21:40 Message: Logged In: YES user_id=38388 Sorry for the late response. About the difference between encoding and decoding: you shouldn't just look at the case where you work with Unicode and strings, e.g. take the rot-13 codec which works on strings only or other codecs which translate objects into strings and vice-versa. Error handling has to be flexible enough to handle all these situations. Since the codecs know best how to handle the situations, I'd make this an implementation detail of the codec and leave the behaviour undefined in the general case. For the existing codecs, backward compatibility should be maintained, if at all possible. If the patch gets overly complicated because of this, we may have to provide a downgrade solution for this particular problem (I don't think replace is used in any computational context, though, since you can never be sure how many replacement character do get inserted, so the case may not be that realistic). Raising an exception for the charmap codec is the right way to go, IMHO. I would consider the current behaviour a bug. For new codecs, I think we should suggest that replace tries to collect as much illegal data as possible before invoking the error handler. The handler should be aware of the fact that it won't necessarily get all the broken data in one call. About the codec error handling registry: You seem to be using a Unicode specific approach here. I'd rather like to see a generic approach which uses the API we discussed earlier. Would that be possible ? In that case, the codec API should probably be called codecs.register_error('myhandler', myhandler). Does that make sense ? BTW, the patch which uses the callback registry does not seem to be available on this SF page (the last patch still converts the errors argument to a PyObject, which shouldn't be needed anymore with the new approach). Can you please upload your latest version ? Note that the highlighting codec would make a nice example for the new feature. Thanks. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-04-17 12:21 Message: Logged In: YES user_id=89016 Another note: the patch will change the meaning of charmap encoding slightly: currently "replace" will put a ? into the output, even if ? is not in the mapping, i.e. codecs.charmap_encode(u"c", "replace", {ord("a"): ord ("b")}) will return ('?', 1). With the patch the above example will raise an exception. Off course with the patch many more replace characters can appear, so it is vital that for the replacement string the mapping is done. Is this semantic change OK? (I guess all of the existing codecs have a mapping ord("?")->ord("?")) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-15 18:19 Message: Logged In: YES user_id=89016 So this means that the encoder can collect illegal characters and pass it to the callback. "replace" will replace this with (end-start)*u"?". Decoders don't collect all illegal byte sequences, but call the callback once for every byte sequence that has been found illegal and "replace" will replace it with u"?". Does this make sense? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-15 18:06 Message: Logged In: YES user_id=89016 For encoding it's always (end-start)*u"?": >>> u"ää".encode("ascii", "replace") '??' But for decoding, it is neither nor: >>> "\Ux\U".decode("unicode-escape", "replace") u'\ufffd\ufffd' i.e. a sequence of 5 illegal characters was replace by two replacement characters. This might mean that decoders can't collect all the illegal characters and call the callback once. They might have to call the callback for every single illegal byte sequence to get the old behaviour. (It seems that this patch would be much, much simpler, if we only change the encoders) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 19:36 Message: Logged In: YES user_id=38388 Hmm, whatever it takes to maintain backwards compatibility. Do you have an example ? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-08 18:31 Message: Logged In: YES user_id=89016 What should replace do: Return u"?" or (end-start)*u"?" ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 16:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aäüöo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gürk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"äöü".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gürk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"äöü".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gürk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gürk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Fri Jul 26 14:21:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 06:21:29 -0700 Subject: [Patches] [ python-Patches-586999 ] error in example in smtplib.py Message-ID: Patches item #586999, was opened at 2002-07-26 17:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586999&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Stepan Koltsov (yozh) Assigned to: Nobody/Anonymous (nobody) Summary: error in example in smtplib.py Initial Comment: I found this while looking for errors that can appear if PEP 295 will be approved ;-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586999&group_id=5470 From noreply@sourceforge.net Fri Jul 26 17:23:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 09:23:15 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 15:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 16:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 17:30:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 09:30:32 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-26 12:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 12:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 17:54:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 09:54:13 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 10:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-26 11:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 11:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 11:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 18:52:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 10:52:52 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-26 13:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-26 12:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 12:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 12:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 19:54:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 11:54:48 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-26 14:54 Message: Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 13:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-26 12:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 12:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 12:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 19:54:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 11:54:59 -0700 Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py Message-ID: Patches item #585913, was opened at 2002-07-24 15:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Copeland (oracle) Assigned to: Nobody/Anonymous (nobody) Summary: Adds Galeon support to webbrowser.py Initial Comment: Simple context diff against current CVS tree to add support for Galeon to webbrowser.py ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-26 20:54 Message: Logged In: YES user_id=21627 How does this relate to https://sourceforge.net/tracker/index.php?func=detail&aid=586437&group_id=5470&atid=305470 ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 From noreply@sourceforge.net Fri Jul 26 19:55:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 11:55:07 -0700 Subject: [Patches] [ python-Patches-586437 ] galeon support in webbrowser Message-ID: Patches item #586437, was opened at 2002-07-25 14:05 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Supreet Sethi (supreet) Assigned to: Nobody/Anonymous (nobody) Summary: galeon support in webbrowser Initial Comment: adds galeon support to webbrowser.py ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-26 20:55 Message: Logged In: YES user_id=21627 How does this relate to https://sourceforge.net/tracker/index.php?func=detail&aid=585913&group_id=5470&atid=305470 ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470 From noreply@sourceforge.net Fri Jul 26 20:14:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 12:14:53 -0700 Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux Message-ID: Patches item #581705, was opened at 2002-07-15 16:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Rasjid Wilcox (rasjidw) >Assigned to: Martin v. Löwis (loewis) Summary: fix to pty.spawn error on Linux Initial Comment: I submitted a bug report, id 581698 called 'pty.spawn - wrong error caught'. System: RedHat Linux 7.3, using Python2. About a year ago, the final 'except' statement was changed to catch IOError rather than just error. However, at least on my system, the os.read call raises an OSError, not an IOError. Therefore, the wrong error type is now caught. Patch attached. Rasjid. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470 From noreply@sourceforge.net Fri Jul 26 20:50:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 12:50:20 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 10:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-07-26 14:50 Message: Logged In: YES user_id=44345 Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96 L.sort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96 L.msort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 13:54 Message: Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 12:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-26 11:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 11:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 11:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Fri Jul 26 20:56:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 12:56:44 -0700 Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py Message-ID: Patches item #585913, was opened at 2002-07-24 08:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Copeland (oracle) Assigned to: Nobody/Anonymous (nobody) Summary: Adds Galeon support to webbrowser.py Initial Comment: Simple context diff against current CVS tree to add support for Galeon to webbrowser.py ---------------------------------------------------------------------- Comment By: Greg Copeland (oracle) Date: 2002-07-26 14:56 Message: Logged In: YES user_id=40173 Not really sure. I assume it's just a second patch by another author. What can I say, day late and a dollar short. ;) Having looked at the other patch, it appears mine is a little more well rounded/complete/feature rich, if only slightly. I invite you to take a look for your self. I'm also not sure what version of webbrowser.py the other patch is against. My patch is against the CVS version so it will be a breeze to apply. Enjoy! ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-26 13:54 Message: Logged In: YES user_id=21627 How does this relate to https://sourceforge.net/tracker/index.php?func=detail&aid=586437&group_id=5470&atid=305470 ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470 From noreply@sourceforge.net Fri Jul 26 21:38:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 13:38:13 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-26 16:38 Message: Logged In: YES user_id=31435 Intrigued by a comment of McIlroy, I tried catenating all the .c files in Objects and Modules, into one giant file, and sorted that. msort got a 22% speedup there, suggesting there's *some* kind of significant pre-existing lexicographic order (and/or reverse order) in C source files that msort is able to exploit. Trying it again on about 1.33 million lines of Python-Dev archive (including assorted uuencoded attachmets). msort got a 32% speedup. I'm not sure what to make of that, but we needed some real life data here . ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-26 15:50 Message: Logged In: YES user_id=44345 Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96 L.sort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96 L.msort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 14:54 Message: Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 13:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-26 12:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 12:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 12:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Sat Jul 27 01:58:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 17:58:59 -0700 Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix) Message-ID: Patches item #584245, was opened at 2002-07-20 12:49 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: get python to link on OSF1 (Dec Unix) Initial Comment: Attached is a patch to fix the linking of python (makedev not found) on Dec OSF/1 Unix 5.1. This patch has also been tested on Linux (RedHat 7.2). ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-26 20:58 Message: Logged In: YES user_id=33168 This patch uses AC_TRY_LINK instead of AC_TRY_RUN. It tries makedev according to Martin's suggestion. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-23 17:43 Message: Logged In: YES user_id=21627 That patch doesn't really test whether defining OSF_SOURCE helps in getting makedev, does it? In particular, if makedev is not available at all, or requires a different define, the test will still conclude that OSF_SOURCE should be defined, right? I think the sequence should be: - is makedev already available? - if not, is it with OSF_SOURCE defined? - if not, arrange to exclude makedev from posixmodule.c Also, is it necessary to run the test program? autoconf is always worried that cross-compilation would fail, since you cannot run tests (although it is reasonable to link test programs in a cross-compilation environment). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470 From noreply@sourceforge.net Sat Jul 27 02:04:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 18:04:47 -0700 Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS Message-ID: Patches item #577031, was opened at 2002-07-03 11:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Remove PyArg_Parse() and METH_OLDARGS Initial Comment: This patch removes more PyArg_Parse() and METH_OLDARGS which are deprecated. I've tested in select and string, but want to make sure there's nothing else I'm missing. I also have a huge change to glmodule, but I can't test that. The diff is attached. Let me know if I should check in glmodule or leave it alone. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-26 21:04 Message: Logged In: YES user_id=33168 All the "s" / PyString_Check() changes are in fmmodule. I suggest to not patch fmmodule now. Are all the other changes ok? Should I bother fixing glmodule at all? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 01:45 Message: Logged In: YES user_id=21627 The changes look good, except for the ones that change parsing of "s" to PyString_Check: that means to lose support for Unicode. For some of these methods, that may be acceptable, but that would need documentation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 From noreply@sourceforge.net Sat Jul 27 02:24:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 26 Jul 2002 18:24:10 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-26 11:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-07-26 21:24 Message: Logged In: YES user_id=31435 I attached timsort.txt, a plain-text detailed description of the algorithm. After I dies, it's the only clue that will remain . ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 16:38 Message: Logged In: YES user_id=31435 Intrigued by a comment of McIlroy, I tried catenating all the .c files in Objects and Modules, into one giant file, and sorted that. msort got a 22% speedup there, suggesting there's *some* kind of significant pre-existing lexicographic order (and/or reverse order) in C source files that msort is able to exploit. Trying it again on about 1.33 million lines of Python-Dev archive (including assorted uuencoded attachmets). msort got a 32% speedup. I'm not sure what to make of that, but we needed some real life data here . ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-26 15:50 Message: Logged In: YES user_id=44345 Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96 L.sort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96 L.msort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 14:54 Message: Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 13:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-26 12:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-26 12:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-26 12:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Sat Jul 27 08:54:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 27 Jul 2002 00:54:02 -0700 Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux Message-ID: Patches item #581705, was opened at 2002-07-16 00:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Rasjid Wilcox (rasjidw) Assigned to: Martin v. Löwis (loewis) Summary: fix to pty.spawn error on Linux Initial Comment: I submitted a bug report, id 581698 called 'pty.spawn - wrong error caught'. System: RedHat Linux 7.3, using Python2. About a year ago, the final 'except' statement was changed to catch IOError rather than just error. However, at least on my system, the os.read call raises an OSError, not an IOError. Therefore, the wrong error type is now caught. Patch attached. Rasjid. ---------------------------------------------------------------------- >Comment By: Rasjid Wilcox (rasjidw) Date: 2002-07-27 17:54 Message: Logged In: YES user_id=39640 Actually, a bit more testing revealed some more errors when the main process had its standard input and output something other than a tty. I attach my second version of the patch. Rasjid. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470 From noreply@sourceforge.net Sat Jul 27 09:20:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 27 Jul 2002 01:20:17 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-27 01:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Anthony Baxter (anthonybaxter) Date: 2002-07-27 18:20 Message: Logged In: YES user_id=29957 Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7. (sort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.29 0.03 0.02 0.29 0.03 0.09 0.02 0.31 16 65536 0.66 0.05 0.05 0.68 0.05 0.20 0.05 0.71 17 131072 1.50 0.11 0.11 1.51 0.12 0.47 0.11 1.60 18 262144 3.25 0.23 0.22 3.37 0.25 1.18 0.22 3.52 19 524288 6.88 0.45 0.43 7.30 0.51 1.91 0.43 7.43 20 1048576 14.90 0.92 0.88 15.49 1.05 3.89 0.90 16.04 (timsort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.28 0.02 0.02 0.03 0.02 0.13 0.02 0.05 16 65536 0.59 0.05 0.05 0.06 0.05 0.26 0.05 0.11 17 131072 1.33 0.10 0.09 0.11 0.11 0.54 0.10 0.21 18 262144 2.92 0.22 0.20 0.22 0.21 1.10 0.20 0.44 19 524288 6.33 0.44 0.42 0.43 0.43 2.21 0.41 0.90 20 1048576 13.56 0.89 0.85 0.84 0.87 4.51 0.87 1.82 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 11:24 Message: Logged In: YES user_id=31435 I attached timsort.txt, a plain-text detailed description of the algorithm. After I dies, it's the only clue that will remain . ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 06:38 Message: Logged In: YES user_id=31435 Intrigued by a comment of McIlroy, I tried catenating all the .c files in Objects and Modules, into one giant file, and sorted that. msort got a 22% speedup there, suggesting there's *some* kind of significant pre-existing lexicographic order (and/or reverse order) in C source files that msort is able to exploit. Trying it again on about 1.33 million lines of Python-Dev archive (including assorted uuencoded attachmets). msort got a 32% speedup. I'm not sure what to make of that, but we needed some real life data here . ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-27 05:50 Message: Logged In: YES user_id=44345 Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96 L.sort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96 L.msort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 04:54 Message: Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 03:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-27 02:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 02:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-27 02:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Sat Jul 27 12:23:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 27 Jul 2002 04:23:05 -0700 Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort Message-ID: Patches item #587076, was opened at 2002-07-27 01:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Tim Peters (tim_one) Assigned to: Nobody/Anonymous (nobody) Summary: Adaptive stable mergesort Initial Comment: This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get . ---------------------------------------------------------------------- >Comment By: Anthony Baxter (anthonybaxter) Date: 2002-07-27 21:23 Message: Logged In: YES user_id=29957 PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96 (samplesort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.07 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.17 0.02 0.06 0.01 0.19 17 131072 0.41 0.04 0.04 0.41 0.04 0.16 0.04 0.44 18 262144 0.93 0.09 0.08 0.90 0.10 0.33 0.08 0.97 19 524288 2.04 0.18 0.16 1.98 0.23 0.69 0.17 2.13 20 1048576 4.49 0.36 0.34 4.52 0.43 1.44 0.33 4.65 (timsort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.00 0.01 0.04 0.00 0.01 16 65536 0.18 0.02 0.02 0.02 0.01 0.07 0.02 0.04 17 131072 0.42 0.03 0.04 0.04 0.04 0.14 0.03 0.08 18 262144 0.95 0.08 0.08 0.09 0.08 0.30 0.07 0.17 19 524288 2.08 0.17 0.16 0.17 0.17 0.63 0.17 0.34 20 1048576 4.56 0.33 0.33 0.33 0.35 1.29 0.33 0.71 PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4 (samplesort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.08 0.00 0.02 0.01 0.08 16 65536 0.18 0.01 0.02 0.18 0.01 0.06 0.02 0.19 17 131072 0.41 0.04 0.04 0.39 0.04 0.16 0.04 0.44 18 262144 0.94 0.08 0.08 0.91 0.10 0.33 0.07 0.95 19 524288 2.05 0.17 0.16 2.07 0.20 0.70 0.16 2.11 20 1048576 4.50 0.34 0.32 4.30 0.42 1.41 0.32 4.61 (timsort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.09 0.01 0.00 0.01 0.01 0.04 0.01 0.01 16 65536 0.18 0.02 0.02 0.02 0.01 0.07 0.02 0.04 17 131072 0.41 0.04 0.04 0.04 0.03 0.14 0.03 0.08 18 262144 0.93 0.08 0.07 0.08 0.08 0.31 0.08 0.16 19 524288 2.07 0.15 0.15 0.16 0.16 0.63 0.16 0.34 20 1048576 4.54 0.33 0.31 0.32 0.33 1.28 0.32 0.67 ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2002-07-27 18:20 Message: Logged In: YES user_id=29957 Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7. (sort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.29 0.03 0.02 0.29 0.03 0.09 0.02 0.31 16 65536 0.66 0.05 0.05 0.68 0.05 0.20 0.05 0.71 17 131072 1.50 0.11 0.11 1.51 0.12 0.47 0.11 1.60 18 262144 3.25 0.23 0.22 3.37 0.25 1.18 0.22 3.52 19 524288 6.88 0.45 0.43 7.30 0.51 1.91 0.43 7.43 20 1048576 14.90 0.92 0.88 15.49 1.05 3.89 0.90 16.04 (timsort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.28 0.02 0.02 0.03 0.02 0.13 0.02 0.05 16 65536 0.59 0.05 0.05 0.06 0.05 0.26 0.05 0.11 17 131072 1.33 0.10 0.09 0.11 0.11 0.54 0.10 0.21 18 262144 2.92 0.22 0.20 0.22 0.21 1.10 0.20 0.44 19 524288 6.33 0.44 0.42 0.43 0.43 2.21 0.41 0.90 20 1048576 13.56 0.89 0.85 0.84 0.87 4.51 0.87 1.82 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 11:24 Message: Logged In: YES user_id=31435 I attached timsort.txt, a plain-text detailed description of the algorithm. After I dies, it's the only clue that will remain . ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 06:38 Message: Logged In: YES user_id=31435 Intrigued by a comment of McIlroy, I tried catenating all the .c files in Objects and Modules, into one giant file, and sorted that. msort got a 22% speedup there, suggesting there's *some* kind of significant pre-existing lexicographic order (and/or reverse order) in C source files that msort is able to exploit. Trying it again on about 1.33 million lines of Python-Dev archive (including assorted uuencoded attachmets). msort got a 32% speedup. I'm not sure what to make of that, but we needed some real life data here . ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-07-27 05:50 Message: Logged In: YES user_id=44345 Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96 L.sort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96 L.msort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 04:54 Message: Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 03:52 Message: Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 ---------------------------------------------------------------------- Comment By: Kevin Jacobs (jacobs99) Date: 2002-07-27 02:54 Message: Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-07-27 02:30 Message: Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself . ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-07-27 02:23 Message: Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470 From noreply@sourceforge.net Sat Jul 27 23:23:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 27 Jul 2002 15:23:22 -0700 Subject: [Patches] [ python-Patches-544113 ] merging sorted sequences Message-ID: Patches item #544113, was opened at 2002-04-15 07:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=544113&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: merging sorted sequences Initial Comment: This patch is intended to add to the bisect module a function witch permit to merge several sorted sequences into an ordered list. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-07-27 18:23 Message: Logged In: YES user_id=6380 Thanks. This doesn't strike me as a "fundamental" algorithm like bisection or heap sort. I don't think I've ever needed this, except perhaps in situations where the amount of data was small enough that simply concatenating the lists and sorting them was an acceptable 3-line solution. Therefore I'm rejecting this unless you get someone of importance to plead for it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=544113&group_id=5470 From noreply@sourceforge.net Sun Jul 28 10:43:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 02:43:33 -0700 Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux Message-ID: Patches item #581705, was opened at 2002-07-15 16:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Rasjid Wilcox (rasjidw) Assigned to: Martin v. Löwis (loewis) Summary: fix to pty.spawn error on Linux Initial Comment: I submitted a bug report, id 581698 called 'pty.spawn - wrong error caught'. System: RedHat Linux 7.3, using Python2. About a year ago, the final 'except' statement was changed to catch IOError rather than just error. However, at least on my system, the os.read call raises an OSError, not an IOError. Therefore, the wrong error type is now caught. Patch attached. Rasjid. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 11:43 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed (in a slightly modified form) as pty.py 1.12. ---------------------------------------------------------------------- Comment By: Rasjid Wilcox (rasjidw) Date: 2002-07-27 09:54 Message: Logged In: YES user_id=39640 Actually, a bit more testing revealed some more errors when the main process had its standard input and output something other than a tty. I attach my second version of the patch. Rasjid. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470 From noreply@sourceforge.net Sun Jul 28 10:58:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 02:58:18 -0700 Subject: [Patches] [ python-Patches-575827 ] SSL release GIL Message-ID: Patches item #575827, was opened at 2002-07-01 07:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Gerhard Häring (ghaering) Assigned to: Martin v. Löwis (loewis) Summary: SSL release GIL Initial Comment: This is more or less a rewrite of parts of patch #475045. It releases the GIL during the SSL operations for opening a SSL socket. Currently the GIL is only released during the read and write operations to a SSL socket. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 11:58 Message: Logged In: YES user_id=21627 Thanks for the patch, applied as _ssl.c 1.7. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-07-01 07:15 Message: Logged In: YES user_id=163326 Randomly assigning to Martin, who proofread my previous patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:02:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:02:44 -0700 Subject: [Patches] [ python-Patches-554807 ] Add _winreg support for Cygwin Message-ID: Patches item #554807, was opened at 2002-05-11 14:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470 Category: Windows Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Add _winreg support for Cygwin Initial Comment: This adds _winreg support to Cygwin Python without dependencies on other Windows modules. For platforms in which MS_WINDOWS isn't defined, this reports the OSError exception instead of WindowsErr. It also uses the non-MBCS versions of registry access in this case. Some minor changes to _winreg.c were made to clean up compiler warnings from GCC. setup.py was changed to create a dynamic _winreg module under cygwin. There are also some earlier changes in the patch file to skip the import test (due to Cygwin fork issues), and to require libintl when building _locale under Cygwin. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:02 Message: Logged In: YES user_id=21627 Is any kind of tweaking forthcoming? ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-05-15 15:30 Message: Logged In: YES user_id=329402 It sounds like the patches need some tweaking (my testing had passed but was certainly limited). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-15 14:57 Message: Logged In: YES user_id=21627 Yes, but you are wrong assuming that the *A functions expect Latin-1. Instead, they expect char* encoded as CP_ACP, which is known as "mbcs" in Python. The *W functions do *not* expect multi-byte strings, but Unicode strings. Notice that _winreg also calls the *A functions, even in MSVC builds. So I think converting Unicode to Latin-1 is definitely incorrect. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-05-15 14:48 Message: Logged In: YES user_id=329402 Windows supplies two versions of the relevant functions. The Cygwin version (at least as built) uses the ANSI versions, as indicated by the A at the end of the symbol names: $ nm _winreg.o | grep RegQueryValue U _RegQueryValueA@16 U _RegQueryValueExA@24 As opposed to the "Windows Unicode/wide-char" functions, which end in W and require MBCS functions to decode. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-15 00:23 Message: Logged In: YES user_id=21627 Can you please explain why not using MBCS is the right thing? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:03:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:03:42 -0700 Subject: [Patches] [ python-Patches-554718 ] OpenBSD updates for build process Message-ID: Patches item #554718, was opened at 2002-05-11 02:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554718&group_id=5470 Category: Build Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Matt Behrens (mattbehrens) Assigned to: Nobody/Anonymous (nobody) Summary: OpenBSD updates for build process Initial Comment: The following patches are currently in our packaging system. A brief summary: - Use 'cc -shared' to build shared libraries, as is strictly correct on OpenBSD. - Use -fPIC instead of -fpic. - Use OpenBSD threads. - Fix the test_fcntl test. Another patch item will be posted shortly for Python 2.2, for similar items. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:03 Message: Logged In: YES user_id=21627 Are you still interested in this patch? If so, what are the answers to these questions? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-21 11:25 Message: Logged In: YES user_id=21627 What architectures have been using ELF before OpenBSD 2.8? I'd still like to simplify this logic, perhaps by removing support for systems that nobody uses anymore. As for -pthread: the test for OpenBSD specifically should go. Instead, I propose to integrate this with the -Kpthread logic: there should be a sequence of options tested, and the first one shown to enable pthreads should be used. The set of options should be -Kpthread (for SysV), -pthread (for BSD and Linux), -pthreads (for gcc on Solaris). I'd be willing to accept a test-for-system for 2.1, since it does not have the -Kpthread test, but for 2.2 and 2.3, we should remove the set of tests used. Also, why does it AC_DEFINE _REENTRANT and _POSIX_THREADS? Those two should be implied by -pthread. Also, what OpenBSD releases could be deprecated without losing users? ---------------------------------------------------------------------- Comment By: Matt Behrens (mattbehrens) Date: 2002-05-21 00:47 Message: Logged In: YES user_id=240525 >From brad@: > There isn't a test for -pthread option so Python will not correctly > compile with threads support. Testing for libc_r is NOT correct. So, the answer is no, the standard POSIX threads test does not work. ---------------------------------------------------------------------- Comment By: Matt Behrens (mattbehrens) Date: 2002-05-20 14:17 Message: Logged In: YES user_id=240525 Okay, well let's comment in this bug then. Changing the subject and closing out 554719. I'll put all patches on this bug. I am trying to verify most of this with brad@openbsd.org, who has contributed some parts of these patches. On cc -shared, this is my understanding: - All OpenBSD ELF architectures have always used cc -shared. - Before OpenBSD 2.8, a.out architectures used ld -Bshareable. - As of OpenBSD 2.8, cc -shared worked on a.out architectures as well, and ld -Bshareable became deprecated. On -fPIC: -fPIC has always worked. The difference between -fpic and -fPIC is simply that -fpic is less efficient. On threads, I am still waiting for an answer from brad@, this is his change. I'll ask him again today. Thanks. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-12 18:49 Message: Logged In: YES user_id=21627 The -shared chunk looks frightening. What is the first BSD release where ld -Bshareable stops working? Could you rearrange this to integrate the version numbers into the OpenBSD* match? Also, what releases need the ELF test? Could that be restricted to the older releases, too? Would it be acceptable to stop supporting OpenBSD 0 and 1? Is usage of -fPIC correct on OpenBSD 0.x? If not, what is the first release that supports -fPIC? It looks like that 'OpenBSD threads' are 'POSIX threads'? Why does the existing test for Posix threads fail to detect their presence? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554718&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:24:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:24:12 -0700 Subject: [Patches] [ python-Patches-554716 ] __va_copy patches Message-ID: Patches item #554716, was opened at 2002-05-11 02:08 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554716&group_id=5470 Category: Core (C code) Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Matt Behrens (mattbehrens) Assigned to: Nobody/Anonymous (nobody) Summary: __va_copy patches Initial Comment: This issue was discovered when preparing for OpenBSD 3.1, and compiling on our non-i386 arches. Let me quote a mail from drahn@openbsd.org: > [Tell the Python guys] the vararg handling is poor, and that this is possible solution, but not a great solution. If possible It would be best to not parse the varargs argument twice. > Different architectures deal with varargs differently, __va_copy is a way that some architectures use do a deep copy. __va_copy is present in solaris and powerpc (*BSD and Linux) as far as I know. Attached is the patches we are using to build our Python package; without it we cannot build Python 2.2 on arches like powerpc as the built python cores. Python 2.1 does not need these patches. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:24 Message: Logged In: YES user_id=21627 Thanks for the patch, applied as abstract.c 2.93.6.5 stringobject.c 2.147.6.6 getargs.c 2.90.6.1 modsupport.c 2.58.16.2 abstract.c 2.104 stringobject.c 2.171 getargs.c 2.93 modsupport.c 2.61 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554716&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:30:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:30:51 -0700 Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type Message-ID: Patches item #554192, was opened at 2002-05-09 19:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: mimetypes: all extensions for a type Initial Comment: This patch adds a function guess_all_extensions to mimetypes.py. This function returns all known extensions for a given type, not just the first one found in the types_map dictionary. guess_extension is still present and returns the first from the list. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:30 Message: Logged In: YES user_id=21627 What is the role of add_type in this patch? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:34:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:34:49 -0700 Subject: [Patches] [ python-Patches-552812 ] Better description in "python -h" for -u Message-ID: Patches item #552812, was opened at 2002-05-06 11:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552812&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Sean Reifschneider (jafo) Assigned to: Nobody/Anonymous (nobody) >Summary: Better description in "python -h" for -u Initial Comment: A new user was confused by the fact that "python -u" in combination with "sys.stdin.xreadlines()" was not doing what he expects. I believe that this modification makes it a bit more clear that there is internal buffering which "-u" does not influence. Also included is a man-page modification of similar nature (though more detailed). ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:34 Message: Logged In: YES user_id=21627 Thanks for the patch, committed as python.man 1.25 main.c 1.65 ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2002-05-08 12:26 Message: Logged In: YES user_id=81797 Ok, I've converted it to a single line note referencing the man-page. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-08 11:17 Message: Logged In: YES user_id=21627 I dislike the change to add many new lines to the -h output. Can you squeeze this into one less line, e.g. by referring to the documentation? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552812&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:36:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:36:38 -0700 Subject: [Patches] [ python-Patches-550192 ] Set softspace to 0 in raw_input() Message-ID: Patches item #550192, was opened at 2002-04-29 16:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550192&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) >Assigned to: Martin v. Löwis (loewis) Summary: Set softspace to 0 in raw_input() Initial Comment: Setting softspace to 0 in raw_input() makes it behave as expected when a "print 'something'," precedes the raw_input() call, with or without a prompt argument. ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-05-03 21:07 Message: Logged In: YES user_id=7887 Ok.. now it outputs an extra space if softspace was true, as expected after a "print 'something',". Thanks again. ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-05-03 20:45 Message: Logged In: YES user_id=7887 Please, don't apply it yet. I'm testing some aspects of the patch. ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-05-03 19:53 Message: Logged In: YES user_id=7887 Sure! Here's a fixed patch including those cleanups. Thank you! ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-05-03 08:04 Message: Logged In: YES user_id=21627 The checking logic for a lost stdout appears to be broken: it should already check for an exception right when verifying whether stdout isatty. Can you incorporate such cleanup in your patch? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550192&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:50:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:50:38 -0700 Subject: [Patches] [ python-Patches-543498 ] s/Copyright/License/ in bdist_rpm.py Message-ID: Patches item #543498, was opened at 2002-04-14 00:07 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=543498&group_id=5470 Category: Distutils and setup.py Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: s/Copyright/License/ in bdist_rpm.py Initial Comment: The "Copyright" field in RPM spec files is obsolete. "License" should be used instead. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:50 Message: Logged In: YES user_id=21627 It appears that you need rpm 3.x, which was release 1999. I think this is safe enough to accept this patch; applied as bdist_rpm.py 1.30. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-04-18 08:06 Message: Logged In: YES user_id=21627 So what is the minimum version of the RPM software that accepts the License: field? It is my understanding that rpm(1) may blow up if it does not recognize a field. ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-04-14 18:46 Message: Logged In: YES user_id=7887 The rpm.org site is much more obsolete than this tag . Here is an excerpt from a message of Jeff Johnson in rpm-list (subject is "Re: three questions about building rpms"): ---- [...] This is historical legacy. Originally rpm had Copyright: GPL but everyone said GPL is not a copyright. So, rpm changed the tag name to License:, and, for backward compatibility, used the same numeric value as RPMTAG_COPYRIGHT. Now, everyone gets to ask the next question Which is it Copyright: or License:? and the answer is :-) ---- Every distribution working with rpms, including redhat, has changed (or is changing) the tag to License. Copyright, as Jeff said by himself, is a misgiven name for that field. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-04-14 11:58 Message: Logged In: YES user_id=21627 Can you provide a pointer that shows this obsoletion? http://www.rpm.org/RPM-HOWTO/build.html#SPEC-FILE still says Copyright. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=543498&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:52:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:52:11 -0700 Subject: [Patches] [ python-Patches-470607 ] HTML version of the Idle "documentation" Message-ID: Patches item #470607, was opened at 2001-10-12 17:13 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=470607&group_id=5470 Category: IDLE Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Internet Discovery (idiscovery) Assigned to: Nobody/Anonymous (nobody) >Summary: HTML version of the Idle "documentation" Initial Comment: Idle Help

Features

IDLE has the following features:
  • coded in 100% pure Python, using the Tkinter GUI toolkit (i.e. Tcl/Tk)
  • cross-platform: works on Windows and Unix (on the Mac, there are currently problems with Tcl/Tk)
  • multi-window text editor with multiple undo, Python colorizing and many other features, e.g. smart indent and call tips
  • Python shell window (a.k.a. interactive interpreter)
  • debugger (not complete, but you can set breakpoints, view and step)

Menus

File menu:

New window
create a new editing window
Open...
open an existing file
Open module...
open an existing module (searches sys.path)
Class browser
show classes and methods in current file
Path browser
show sys.path directories, modules, classes and methods
Save
save current window to the associated file (unsaved windows have a * before and after the window title)
Save As...
save current window to new file, which becomes the associated file
Save Copy As...
save current window to different file without changing the associated file
Close
close current window (asks to save if unsaved)
Exit
close all windows and quit IDLE (asks to save if unsaved)

Edit menu:

Undo
Undo last change to current window (max 1000 changes)
Redo
Redo last undone change to current window
Cut
Copy selection into system-wide clipboard; then delete selection
Copy
Copy selection into system-wide clipboard
Paste
Insert system-wide clipboard into window
Select All
Select the entire contents of the edit buffer
Find...
Open a search dialog box with many options
Find again
Repeat last search
Find selection
Search for the string in the selection
Find in Files...
Open a search dialog box for searching files
Replace...
Open a search-and-replace dialog box
Go to line
Ask for a line number and show that line
Indent region
Shift selected lines right 4 spaces
Dedent region
Shift selected lines left 4 spaces
Comment out region
Insert ## in front of selected lines
Uncomment region
Remove leading # or ## from selected lines
Tabify region
Turns leading stretches of spaces into tabs
Untabify region
Turn all tabs into the right number of spaces
Expand word
Expand the word you have typed to match another word in the same buffer; repeat to get a different expansion
Format Paragraph
Reformat the current blank-line-separated paragraph
Import module
Import or reload the current module
Run script
Execute the current file in the __main__ namespace

Windows menu:

Zoom Height
toggles the window between normal size (24x80) and maximum height.
The rest of this menu lists the names of all open windows; select one to bring it to the foreground (deiconifying it if necessary).

Debug menu (in the Python Shell window only):

Go to file/line
look around the insert point for a filename and linenumber, open the file, and show the line.
Open stack viewer
show the stack traceback of the last exception
Debugger toggle
Run commands in the shell under the debugger
JIT Stack viewer toggle
Open stack viewer on traceback

Basic editing and navigation:

  • Backspace deletes to the left; DEL deletes to the right
  • Arrow keys and Page Up/Down to move around
  • Home/End go to begin/end of line
  • Control-Home/End go to begin/end of file
  • Some Emacs bindings may also work, e.g. ^B/^P/^A/^E/^D/^L

Automatic indentation:

After a block-opening statement, the next line is indented by 4 spaces (in the Python Shell window by one tab). After certain keywords (break, return etc.) the next line is dedented. In leading indentation, Backspace deletes up to 4 spaces if they are there. Tab inserts 1-4 spaces (in the Python Shell window one tab). See also the indent/dedent region commands in the edit menu.

Python Shell window:

  • ^C interrupts executing command
  • ^D sends end-of-file; closes window if typed at >>> prompt

Command history:

  • Alt-p retrieves previous command matching what you have typed
  • Alt-n retrieves next
  • Return while on any previous command retrieves that command
  • Alt-/ (Expand word) is also useful here

Syntax colors:

The coloring is applied in a background "thread", so you may occasionally see uncolorized text. To change the color scheme, edit the [Colors] section in config.txt.
Python syntax colors:
Keywords:
orange
Strings :
green
Comments:
red
Definitions:
blue
Shell colors:
Console output:
brown
stdout:
blue
stderr:
dark green
stdin:
black
Command line usage:
	idle.py [-c command] [-d] [-e] [-s] [-t title] [arg] ...

	-c command  run this command
	-d          enable debugger
	-e          edit mode; arguments are files to be edited
	-s          run $IDLESTARTUP or $PYTHONSTARTUP first
	-t title    set title of shell window

If there are arguments:

  1. If -e is used, arguments are files opened for editing and sys.argv reflects the arguments passed to IDLE itself.
  2. Otherwise, if -c is used, all arguments are placed in sys.argv[1:...], with sys.argv[0] set to '-c'.
  3. Otherwise, if neither -e nor -c is used, the first argument is a script which is executed with the remaining arguments in sys.argv[1:...] and sys.argv[0] set to the script name. If the script name is '-', no script is executed but an interactive Python session is started; the arguments are still available in sys.argv.
---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:52 Message: Logged In: YES user_id=21627 Since no further explanation was forthcoming, I reject this patch. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2001-11-11 22:02 Message: Logged In: NO hello ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-12 17:54 Message: Logged In: YES user_id=6380 What do you want us to do with this? Note that IDLE development is going on in the idlefork.sf.net project. You might want to submit it there. And please use the file upload feature. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=470607&group_id=5470 From noreply@sourceforge.net Sun Jul 28 11:54:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 03:54:04 -0700 Subject: [Patches] [ python-Patches-492105 ] Import from Zip archive Message-ID: Patches item #492105, was opened at 2001-12-12 18:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: James C. Ahlstrom (ahlstromjc) Assigned to: Nobody/Anonymous (nobody) Summary: Import from Zip archive Initial Comment: This is the "final" patch to support imports from zip archives, and directory caching using os.listdir(). It replaces patch 483466 and 476047. It is a separate patch since I can't delete file attachments. It adds support for importing from "" and from relative paths. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 12:54 Message: Logged In: YES user_id=21627 Is this patch ready to be applied? ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-06-12 17:05 Message: Logged In: YES user_id=31392 Deleteing the old diffs that Jim couldn't delete. ---------------------------------------------------------------------- Comment By: James C. Ahlstrom (ahlstromjc) Date: 2002-03-15 18:27 Message: Logged In: YES user_id=64929 I added a diff -c version of the patch. ---------------------------------------------------------------------- Comment By: James C. Ahlstrom (ahlstromjc) Date: 2002-03-15 18:03 Message: Logged In: YES user_id=64929 I still can't delete files, but I added a new file which contains all diffs as a single file, and is made from the current CVS tree (Mar 15, 2002). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:00:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:00:06 -0700 Subject: [Patches] [ python-Patches-452232 ] timestamp function for time module Message-ID: Patches item #452232, was opened at 2001-08-17 23:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Gareth Harris (garethharris) Assigned to: Nobody/Anonymous (nobody) Summary: timestamp function for time module Initial Comment: Timestamp creates timestamp strings in ISO or ODBC format in UTC or local timezones. It can also add microseconds where needed. Timestamps are often needed outside database or XML activities, so its proposed location is the time module. timestamp(secs=None,fmt='ISO',TZ=None,fracsec=None): '''Make ISO or ODBC timestamp from [current] time. Parameters: secs= float seconds, else default = time() fmt = 'ISO' use ISO 8601 standard format = "YYYY-MM-DDTHH:MM:SS.mmmmmmZ" Zulu or "YYYY-MM-DDTHH:MM:SS.mmmmmm-hh:mm" local else "YYYY-MM-DD HH:MM:SS.mmmmmm" ODBC TZ = None=GMT/UTC/Zulu, else local time zone fracsec = None, else add microseconds to string ''' Any improvement or standardization is welcome. Gareth Harris gharris@nrao.edu 2001-08-17T21:36:00Z ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:00 Message: Logged In: YES user_id=21627 Since no actual patch is forthcoming, I'm rejecting this. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-03-09 12:40 Message: Logged In: YES user_id=21627 If you want to see the code included, you'd need to provide a context diff, including docs and test cases. However, notice that there may be overlap with the emerging builtin DateTime type, see http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage ---------------------------------------------------------------------- Comment By: Gareth Harris (garethharris) Date: 2002-01-02 17:41 Message: Logged In: YES user_id=300900 Back from travel, other projects etc. [2001.01.02] Thanks for comments thus far. Maybe I will finally meet some of you in Feb. --- I proposed to put this in TIME module UNLESS someone has an idea for a better location. Who takes care of that module? Shall I provide: doc?, test suite? Is a companion decode function needed? OTHERWISE I will put it in sourceforge/activestate? Which is preferred? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-01-01 21:27 Message: Logged In: YES user_id=21627 Gareth, Can you please propose a strategy to advance this patch or withdraw it? If there is no action, I propose to close it by Feb 1, 2002. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-12-06 15:57 Message: Logged In: YES user_id=3066 Another possible alternate home for this would be the Python Snippet repository on SourceForge: http://sourceforge.net/snippet/browse.php?by=lang&lang=6 I'm not suggesting that this doesn't belong in the standard library, however. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2001-09-19 19:46 Message: Logged In: YES user_id=21627 Nice patch. If you want to see this included, you should complete it: Decide on location of the function, provide documentation and test cases. As the location, it may be that the calendar module could provide a home, but you may ask in the newsgroup. If you merely wanted to publish this code snippet, I suggest that you find a better home than the Python patch database, e.g. the Cookbook: http://aspn.activestate.com/ASPN/Cookbook/Python There are a number of other places that collect Python snippets; this is just one option. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:07:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:07:26 -0700 Subject: [Patches] [ python-Patches-458898 ] --python-build for install Message-ID: Patches item #458898, was opened at 2001-09-05 22:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=458898&group_id=5470 Category: Distutils and setup.py Group: None Status: Open >Resolution: Out of Date Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Michael Hudson (mwh) Summary: --python-build for install Initial Comment: Sometimes, being able to install python tools without having python installed is desirable. When building an RPM package of python, for example, one may want to build/install IDLE as well, including it in a subpackage. Indeed, we're doing this with a couple of python tools here at Conectiva. Unfortunately, we have a egg-chicken problem when doing this. You need python installed in your system before you install tools. This limitation may be observed in the file Lib/distutils/sysconfig.py. It looks for Makefile in the final installation directory, for example. This patch adds a new option to dist-utils' install command: --python-build. When used, python will look for these files in the python build directory specified trough the option. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:07 Message: Logged In: YES user_id=21627 It appears that the patch is outdated; set_python_build is no longer. Is the patch still needed? ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-01-31 15:18 Message: Logged In: YES user_id=7887 About.. 1) Sorry.. I'll take care to add comments to the file next time. The bottom one is newer. 2) For now, a local option seems to be ok. If other commands start using it (what seems unprobable right now), we may turn it into a global option without any drawbacks, since global options are acceptable anywhere. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-21 11:45 Message: Logged In: YES user_id=6656 Ta. Some random comments: (1) it's not obvious from this page which of the two patches attached is the newer. This may be sf's fault, but... (2) might it be better to make this a global distutils option? It seems a bit fragile at the moment -- we'd need to change things if, say, build_ext started to depend on python_build. Would, say $ python setup.py --python-build install be better? I dunno, I don't really understand how options chase around distutils yet... ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2002-01-19 01:02 Message: Logged In: YES user_id=7887 Here is a new patch including your suggestions. Thank you!! ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-17 18:49 Message: Logged In: YES user_id=6656 Hey! This patch is less than six months old. Virtually fresh :| Some comments: are you sure you can get away with only honouring --python-build in install? I think build_scripts needs it too (now, anyway, maybe not when you wrote the patch). Also, the mod to install.finalize_options() is in the wrong place wrt. the surrouding comments. Can you fix this? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=458898&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:20:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:20:34 -0700 Subject: [Patches] [ python-Patches-462754 ] no '_d' ending for mingw32 Message-ID: Patches item #462754, was opened at 2001-09-19 05:29 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470 Category: Distutils and setup.py Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Gerhard Häring (ghaering) Assigned to: Nobody/Anonymous (nobody) Summary: no '_d' ending for mingw32 Initial Comment: This patch prevents distutils from naming the extension modules _d.pyd when compiled with mingw32 on Windows in debug mode. Instead, the extension modules will get the normal name .pyd. Technically, the patch doesn't prevent the behaviour for mingw32, but only adds the _d for MS Visual C++ and Borland compilers (though I don't know about the Borland case). The reason for this? Adding "_d" doesn't make any sense for GNU compilers. I think it's just a MS Visual C++ madness. If you want to debug an extension module that was compiled with gcc, you have to use gdb anyway, because the debugging symbols of MSVC++ and gcc are incompatible. So you normally use a release Python version (from the python.org binary download) and compile your extensions with mingw32. To put it shortly: The current state is that you do a "setup.py build --compiler=mingw32 --debug" and then rename the extension modules, removing the _d. Then fire up gdb to debug your module. With this patch, the renaming isn't necessary anymore. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:20 Message: Logged In: YES user_id=21627 This patch is wrong: Whether or not _d should be added to the module name depends on whether or not Py_DEBUG is defined; this is independent on whether --debug was given, atleast for Cygwin (for MSVC, --debug will define _DEBUG which will define Py_DEBUG). So the current distutils is wrong (since it always adds _d), but the patch doesn't make it better (since it never adds _d). Rejecting the patch. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-04-17 07:07 Message: Logged In: YES user_id=163326 If python.exe is compiled --with-pydebug, then this is true. But the point is that I want to compile debug versions of my extension modules and use them with the standard python.exe (*not* python_d.exe). So yes, the patch does work, at least it did when I submitted it . ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-03-09 12:44 Message: Logged In: YES user_id=21627 Does the patch actually work? It seems to me that, if compiled with-pydebug, import will automatically search for the _d version, and complain if it is not found. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-01-04 12:52 Message: Logged In: YES user_id=21627 The rationale for using the debugging version of MSVCRT are not the debugging information alone, but also the additional functionalities, like heap consistency checks and other assertions. So it is not obvious that you do not want to use the debugging version of this library in a debug build. ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2002-01-04 03:50 Message: Logged In: YES user_id=163326 mingw links with msvcrt.dll. I've plans to add mingw32 support to the autoconf build process (hopefully soon enough for 2.3). The GNU and MS debugger symbols are incompatible, though, so I think that mingw32 shouldn't link to the debug version of msrcrt (gdb doesn't understand the Microsoft debugger symbols; and the Visual Studio debugger has no idea what the debugging symbols of gcc are all about; isn't cross-platform and cross-compiler programming fun?). ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2001-12-30 14:13 Message: Logged In: YES user_id=21627 How does the mingw port interact with the debugging libraries? With MSVC, the debug build will link to the debug versions of the CRT. What C library will mingw link with (I hope it won't use crtdll.dll)? ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2001-09-28 23:28 Message: Logged In: YES user_id=163326 Yes. But mingw32 isn't emulating Unix under Windows (that would be Cygwin). It's just a version of gcc and friends that targets native win32. It links against msvcrt (not a Posix emulation library like Cygwin does). This is a bit hypothetical because I didn't yet hack the autoconf build process for native win32 with mingw32. Currently, you cannot build a complete Python with mingw32, but you *can* build extension modules against an existing Python (compiled with M$ VC++). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-09-28 22:43 Message: Logged In: YES user_id=31435 All else being equal, a system emulating Unix under Windows should strive to make life comfortable for Unix folks. The question is thus whether all else is in fact equal . ---------------------------------------------------------------------- Comment By: Gerhard Häring (ghaering) Date: 2001-09-28 20:37 Message: Logged In: YES user_id=163326 Hmm. I don't like the _d endings at all. But if the policy on win32 is that debug executables and libraries get a "_d" ending, then I'm unsure wether this patch should be applied. I have plans to hack the autoconf madness to build a native win32 Python with mingw32. But that won't be ready by tomorror. And I don't think that I'll add "_d" endings there for debugging, because that would be inconsistent with the normal autoconf builds on Unix. I'm glad that *I* don't have to decide wether this patch is a Good Thing. Being consistent with Python win32 build or with GNU (gcc/autoconf). Take your pick :-) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-09-19 05:46 Message: Logged In: YES user_id=31435 FYI, MSVC never adds _d on its own -- Mark Hammond and/or Guido forced it to do that. I don't remember why, but one of them explained it to me long ago and it made good sense at the time . MSCV normally compiles debug and release builds into distinct subdirectories, and uses the same names in both. But *our* MSVC setup forces it to compile both flavors of build directly into the PCbuild directory, so has to give the resulting DLLs and executables different names (else the second build would overwrite the results of the first build). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:29:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:29:27 -0700 Subject: [Patches] [ python-Patches-459381 ] Unambiguous import for encodings Message-ID: Patches item #459381, was opened at 2001-09-07 03:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=459381&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Mikhail Zabaluev (mzabaluev) Assigned to: Nobody/Anonymous (nobody) Summary: Unambiguous import for encodings Initial Comment: The __import__ call in encodings/__init__.py does not specify module hierarchy explicitly. This results in misleading error tracebacks (try "codecs.lookup('codecs')"). Worse, it results in an error when one is trying to lookup a codec and the encoding's name fires some top-level module, e.g 'base64', despite that a codec for this encoding may actually be registered in the system. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:29 Message: Logged In: YES user_id=21627 It appears that this patch has been superceded by #571603; closing this one as a duplicate. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-01-19 11:01 Message: Logged In: YES user_id=21627 I'm in favour of integrating this patch, even though it means that some codecs that are currently found won't be found anymore. Authors of such codecs would need to register a search function. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=459381&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:33:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:33:37 -0700 Subject: [Patches] [ python-Patches-571603 ] Fix bug in encodings.search_function Message-ID: Patches item #571603, was opened at 2002-06-20 13:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Geert Jansen (geertj) Assigned to: Nobody/Anonymous (nobody) Summary: Fix bug in encodings.search_function Initial Comment: Hi, there seems to be a bug in the default encoding search function (search_function in encodings/__init__.py. The function tries to load a module with the name of the encoding, but it doesn't require that this module is in the encodings/ directory. This leads to trouble when you try to use an encoding that has the name of a module in the search path. To demonstrate, save the following line to test.py: print 'Just testing'.encode('test') and run it. This results in a CodecRegistryError exception: "module "test" (test.pyc) failed to register" The bug is present in 2.2.1 and in HEAD. In HEAD there was actually a bugfix for this but it was incomplete. Patches for 2.2.1 and HEAD attached. Greetings, Geert Jansen ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:33 Message: Logged In: YES user_id=21627 Thanks for the patch; applied as __init__.py 1.9 and 1.6.12.1. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:39:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:39:39 -0700 Subject: [Patches] [ python-Patches-572796 ] Executable .pyc-files with hashbang Message-ID: Patches item #572796, was opened at 2002-06-23 18:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572796&group_id=5470 Category: None Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Peter Åstrand (astrand) Assigned to: Nobody/Anonymous (nobody) Summary: Executable .pyc-files with hashbang Initial Comment: As an experiment, I've tested if it was possible to add hashbang (like #!/usr/bin/python) to compiled .pyc-files. The attached patched makes this possible. This can be useful when distributing applications as bytecode. Currently, on a UNIX system, it's necessary to make a wrapper script. I haven't considered portability issues with non-UNIX platforms and things like that. Also, the hash and bang may collide with the magic number. The patch is just a proof-of-concept. I won't be surprised if you all think that this is a bad idea, but I thought I should send the patch anyway. Has this been discussed before? ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:39 Message: Logged In: YES user_id=21627 Since nobody has spoken in favour of this patch, I'm rejecting it. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-06-30 18:23 Message: Logged In: YES user_id=21627 On Linux, you can also use import imp,sys,string magic = string.join(["\x%.2x" % ord(c) for c in imp.get_magic()],"") reg = ':pyc:M::%s::%s:' % (magic, sys.executable) open("/proc/sys/fs/binfmt_misc/register","wb").write(reg) to make the system recognize .pyc files (see Misc/NEWS). ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2002-06-23 18:51 Message: Logged In: NO Well, there's this: http://www.lyra.org/greg/python/#dev Does that help? (mwh, having a fight with sf's login system) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572796&group_id=5470 From noreply@sourceforge.net Sun Jul 28 12:46:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 04:46:22 -0700 Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS Message-ID: Patches item #577031, was opened at 2002-07-03 17:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: Remove PyArg_Parse() and METH_OLDARGS Initial Comment: This patch removes more PyArg_Parse() and METH_OLDARGS which are deprecated. I've tested in select and string, but want to make sure there's nothing else I'm missing. I also have a huge change to glmodule, but I can't test that. The diff is attached. Let me know if I should check in glmodule or leave it alone. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 13:46 Message: Logged In: YES user_id=21627 The other patches look all fine, please apply them. For fmmodule, I'd recommend to convert those functions to VARARGS/ParseTuple. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-27 03:04 Message: Logged In: YES user_id=33168 All the "s" / PyString_Check() changes are in fmmodule. I suggest to not patch fmmodule now. Are all the other changes ok? Should I bother fixing glmodule at all? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:45 Message: Logged In: YES user_id=21627 The changes look good, except for the ones that change parsing of "s" to PyString_Check: that means to lose support for Unicode. For some of these methods, that may be acceptable, but that would need documentation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 From noreply@sourceforge.net Sun Jul 28 16:26:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 08:26:46 -0700 Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS Message-ID: Patches item #577031, was opened at 2002-07-03 11:57 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: Remove PyArg_Parse() and METH_OLDARGS Initial Comment: This patch removes more PyArg_Parse() and METH_OLDARGS which are deprecated. I've tested in select and string, but want to make sure there's nothing else I'm missing. I also have a huge change to glmodule, but I can't test that. The diff is attached. Let me know if I should check in glmodule or leave it alone. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-28 11:26 Message: Logged In: YES user_id=33168 Closing this patch. I'll make a new patch for changing fmmodule as suggested. Checked in as: glmodule 2.10, stringobject.c 2.172, selectmodule.c 2.68. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 07:46 Message: Logged In: YES user_id=21627 The other patches look all fine, please apply them. For fmmodule, I'd recommend to convert those functions to VARARGS/ParseTuple. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-07-26 21:04 Message: Logged In: YES user_id=33168 All the "s" / PyString_Check() changes are in fmmodule. I suggest to not patch fmmodule now. Are all the other changes ok? Should I bother fixing glmodule at all? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 01:45 Message: Logged In: YES user_id=21627 The changes look good, except for the ones that change parsing of "s" to PyString_Check: that means to lose support for Unicode. For some of these methods, that may be acceptable, but that would need documentation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470 From noreply@sourceforge.net Sun Jul 28 16:54:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 08:54:31 -0700 Subject: [Patches] [ python-Patches-574747 ] Make python-mode.el use jython Message-ID: Patches item #574747, was opened at 2002-06-27 21:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574747&group_id=5470 Category: None Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Kevin J. Butler (kevinbutler) Assigned to: Nobody/Anonymous (nobody) Summary: Make python-mode.el use jython Initial Comment: I believe it is time to default to using the "jython" interpreter rather than the "jpython" interpreter. This patch does this in a minimal way, rather than changing all 'jpython' references to 'jython', it just changes the default interpreter command to jython and notes the two names. (I still prefer the 'JPython' name...) ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 17:54 Message: Logged In: YES user_id=21627 Duplicate of 574750. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574747&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:00:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:00:35 -0700 Subject: [Patches] [ python-Patches-574750 ] Make python-mode.el use "jython" interp Message-ID: Patches item #574750, was opened at 2002-06-27 21:38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574750&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Kevin J. Butler (kevinbutler) Assigned to: Nobody/Anonymous (nobody) >Summary: Make python-mode.el use "jython" interp Initial Comment: I believe it is time to start using the "jython" interpreter by default, rather than the "jpython" interpreter. This patch does it in a minimal way, just changing the command and acknowledging the two names. (I still prefer the JPython name, but...) ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 18:00 Message: Logged In: YES user_id=21627 Please use context or unified diffs for patches; I'm attaching your change as a diff. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574750&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:25:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:25:11 -0700 Subject: [Patches] [ python-Patches-574707 ] makesockaddr, use addrlen with AF_UNIX Message-ID: Patches item #574707, was opened at 2002-06-27 20:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574707&group_id=5470 Category: Modules Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Donn Cave (donnc) Assigned to: Nobody/Anonymous (nobody) Summary: makesockaddr, use addrlen with AF_UNIX Initial Comment: makesockaddr(), in 2.1 source, expects a NUL terminated string in sockaddr_un.sun_path. That expectation is routinely not met on some platforms - NetBSD 1.5.2, AIX 4.3.3, probably others. This patch shows how to use addrlen to determine the correct length of the value of sun_path. Here's the diff (I have no idea what it means to "attach" a file from my web browser), against 2.1 source. *** socketmodule.c.dist Sun Apr 15 17:21:33 2001 --- socketmodule.c Thu Jun 27 11:09:57 2002 *************** *** 597,603 **** case AF_UNIX: { struct sockaddr_un *a = (struct sockaddr_un *) addr; ! return PyString_FromString(a->sun_path); } #endif --- 597,605 ---- case AF_UNIX: { struct sockaddr_un *a = (struct sockaddr_un *) addr; ! return PyString_FromStringAndSize(a->sun_path, ! addrlen - ! (sizeof(struct sockaddr_un) - sizeof(a ->sun_path))); } #endif ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 18:25 Message: Logged In: YES user_id=21627 This patch does not work. On systems where a NUL is returned, this NUL is also accounted-for in addrlen, and hence included in the string. Would you like to revise your patch to support both cases? Feel free to use the offsetof macro, btw. Attaching a file is done by checking "Check to Upload and Attach a File:" and adding a file name in the field below. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574707&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:34:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:34:19 -0700 Subject: [Patches] [ python-Patches-573770 ] Changing owner of symlinks Message-ID: Patches item #573770, was opened at 2002-06-25 21:15 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=573770&group_id=5470 Category: Modules Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: Changing owner of symlinks Initial Comment: Currently, there's no way to change the owner of a symbolic link, since chown() follow them. This patch implements the missing lchown() function in posixmodule. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 18:34 Message: Logged In: YES user_id=21627 Thanks for the patch; applied as configure 1.325 configure.in 1.336 pyconfig.h.in 1.46 libos.tex 1.93 NEWS 1.446 posixmodule.c 2.245 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=573770&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:36:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:36:15 -0700 Subject: [Patches] [ python-Patches-574867 ] list.extend docstring fix Message-ID: Patches item #574867, was opened at 2002-06-28 01:32 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574867&group_id=5470 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: David Abrahams (david_abrahams) Assigned to: Nobody/Anonymous (nobody) Summary: list.extend docstring fix Initial Comment: The current docstring implies that extend() can only accept list arguments. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 18:36 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as listobject.c 2.128. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574867&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:42:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:42:09 -0700 Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr Message-ID: Patches item #576458, was opened at 2002-07-02 18:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Thomas Heller (theller) Assigned to: Nobody/Anonymous (nobody) Summary: Extend PyErr_SetFromWindowsErr Initial Comment: PyErr_SetFromWindowsErr and PyErr_SetFromWindowsErrWithFilename can only raise PyExc_WindowsError. This patch introduces variants of these functions taking an additional PyObject* parameter, which allows to specify the type of the exception to raise. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 18:42 Message: Logged In: YES user_id=21627 The patch looks good, please apply it, with the following changes: - add \versionadded marks into the documentation; - add an entry to Misc/NEWS. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-07-05 08:57 Message: Logged In: YES user_id=11105 Sure. Patch uploaded: docpatch.diff ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:47 Message: Logged In: YES user_id=21627 If this is meant to be used by extension modules, it should be documented. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-07-02 18:13 Message: Logged In: YES user_id=11105 Patch for the header file was missing... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:42:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:42:40 -0700 Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr Message-ID: Patches item #576458, was opened at 2002-07-02 18:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Thomas Heller (theller) >Assigned to: Thomas Heller (theller) Summary: Extend PyErr_SetFromWindowsErr Initial Comment: PyErr_SetFromWindowsErr and PyErr_SetFromWindowsErrWithFilename can only raise PyExc_WindowsError. This patch introduces variants of these functions taking an additional PyObject* parameter, which allows to specify the type of the exception to raise. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-28 18:42 Message: Logged In: YES user_id=21627 The patch looks good, please apply it, with the following changes: - add \versionadded marks into the documentation; - add an entry to Misc/NEWS. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-07-05 08:57 Message: Logged In: YES user_id=11105 Sure. Patch uploaded: docpatch.diff ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-07-05 07:47 Message: Logged In: YES user_id=21627 If this is meant to be used by extension modules, it should be documented. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2002-07-02 18:13 Message: Logged In: YES user_id=11105 Patch for the header file was missing... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470 From noreply@sourceforge.net Sun Jul 28 17:47:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 28 Jul 2002 09:47:36 -0700 Subject: [Patches] [ python-Patches-580670 ] less restrictive HTML comments Message-ID: Patches item #580670, was opened at 2002-07-12 19:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Bill Bell (wbell539) Assigned to: Nobody/Anonymous (nobody) Summary: less restrictive HTML comments Initial Comment: Current code enforces requirement that HTML comments open with '