From noreply@sourceforge.net  Mon Jul  1 06:15:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 30 Jun 2002 22:15:10 -0700
Subject: [Patches] [ python-Patches-575827 ] SSL release GIL
Message-ID: <E17OtWo-0001BD-00@usw-sf-web3.sourceforge.net>

Patches item #575827, was opened at 2002-07-01 07:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Nobody/Anonymous (nobody)
Summary: SSL release GIL

Initial Comment:
This is more or less a rewrite of parts of patch
#475045. It releases the GIL during the SSL operations
for opening a SSL socket. Currently the GIL is only
released during the read and write operations to a SSL
socket.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 06:15:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 30 Jun 2002 22:15:51 -0700
Subject: [Patches] [ python-Patches-575827 ] SSL release GIL
Message-ID: <E17OtXT-0001Bm-00@usw-sf-web3.sourceforge.net>

Patches item #575827, was opened at 2002-07-01 07:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerhard Häring (ghaering)
>Assigned to: Martin v. Löwis (loewis)
Summary: SSL release GIL

Initial Comment:
This is more or less a rewrite of parts of patch
#475045. It releases the GIL during the SSL operations
for opening a SSL socket. Currently the GIL is only
released during the read and write operations to a SSL
socket.

----------------------------------------------------------------------

>Comment By: Gerhard Häring (ghaering)
Date: 2002-07-01 07:15

Message:
Logged In: YES 
user_id=163326

Randomly assigning to Martin, who proofread my previous patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 09:59:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 01:59:26 -0700
Subject: [Patches] [ python-Patches-574532 ] Update freeze to use zlib 1.1.4
Message-ID: <E17Ox1q-000662-00@usw-sf-web5.sourceforge.net>

Patches item #574532, was opened at 2002-06-27 11:30
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470

Category: Demos and tools
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Lawrence Hudson (lhudson)
Assigned to: Nobody/Anonymous (nobody)
Summary: Update freeze to use zlib 1.1.4

Initial Comment:
freeze currently looks for zlib 1.1.3.


----------------------------------------------------------------------

>Comment By: Lawrence Hudson (lhudson)
Date: 2002-07-01 08:59

Message:
Logged In: YES 
user_id=82888

D'Oh!  Sorry about that.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-28 01:14

Message:
Logged In: YES 
user_id=14198

there is no patch attached here that I can see!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 19:03:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 11:03:21 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17P5WD-0004ia-00@usw-sf-web3.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 07:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 20:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 05:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 07:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 19:08:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 11:08:23 -0700
Subject: [Patches] [ python-Patches-575224 ] dict(seqn, value)
Message-ID: <E17P5b5-0004lN-00@usw-sf-web3.sourceforge.net>

Patches item #575224, was opened at 2002-06-29 01:00
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575224&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Guido van Rossum (gvanrossum)
Summary: dict(seqn, value)

Initial Comment:
Have the dict() constructor accept a pair of 
arguments, a sequence of keys and a constant value. 
Addresses the common task of initializing dictionary 
elements to a constant value. Useful for building fast 
membership tests and for quickly (C-speed) 
eliminating duplicates in a sequence.  Is faster, more 
flexible, and clearer than:
   d = {}
   map(d.__setitem__, seqn, [])

Examples:
  uniq = dict(seqn,True).keys()  # eliminate duplicates
  termwords = dict('End Quit Stop Abort'.split(), True)
  if lexeme in termwords:  sys.exit(0)
  absences = dict('Tom Dick Harry'.split(), 0)

Patch includes source, docs, and unittest.  Also 
includes a minor change to shlex.py showing how the 
builtin can cleanly update existing code to achieve an 
order of magnitude performance boost (classifying 
characters is the most common operation in shlex).

Summary of discussion on py-dev:
At Walter and Barry's suggestion, the value was 
allowed to take any value (I initially used None). At 
Tim's suggestion, I went to an explicit two argument 
form to avoid ambiguity. If we ever get sets, Timbot 
thinks that they ought to be the tool of choice for two 
of the above use cases. Jack Jansen likes the tool 
and wants to go further and warn of inefficient 
searching when 'in' is used with sequences giving O(n) 
search speed when the could have O(1). The F/bot 
and Steve Holden poked at me for proposing 
something (speed and clarity aside) that can already 
be handled using existing constructs and Dave 
Abrahams disagreed with them.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 20:08

Message:
Logged In: YES 
user_id=38376

The "obvious" other way to use a 2-argument to dict() would 
be dict(d.keys(), d.values()).  Not sure what's more common, 
though...

(and for the record, I'd prefer a separate "set" 
type/constructor, even if it's basically just a dict without some 
of the methods)

</F>

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-06-30 21:47

Message:
Logged In: YES 
user_id=80475

I'm away from the computer for the next five weeks.  Oren 
Tirosh will champion this patch from here forward.  He 
can lead the discussion and made any requested 
modifications.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575224&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 19:09:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 11:09:46 -0700
Subject: [Patches] [ python-Patches-572936 ] (?(id/name)yes|no) re implementation
Message-ID: <E17P5cQ-0004mE-00@usw-sf-web3.sourceforge.net>

Patches item #572936, was opened at 2002-06-24 03:41
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572936&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
>Assigned to: Fredrik Lundh (effbot)
Summary: (?(id/name)yes|no) re implementation

Initial Comment:
This patch implements a regular expression feature, which allows   
some interesting patterns, in the same way as implemented in perl.   
For example, (?(1)yes|no) matches with "yes" if group "1" exists, and   
with "no", if it doesn't. Without this feature, the regular expression   
must be duplicated to get the same results. In addition to perl's 
feature, it will also accept a Python named group as argument. 
   
Here's an example:   
   
(<)?\w+@\w+(\.\w+)+(?(1)>)   
  
This is a poor email matching regular expression, which will match   
with or without the "<>" symbols.   
   

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572936&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 19:15:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 11:15:22 -0700
Subject: [Patches] [ python-Patches-569328 ] names in types module
Message-ID: <E17P5hq-0004qu-00@usw-sf-web3.sourceforge.net>

Patches item #569328, was opened at 2002-06-15 11:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569328&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: names in types module

Initial Comment:
Adds names to types module so types are accessible as 
'type.spam' in addition to the existing longer version 
'types.SpamType'. 

The short names match the type's __name__ attribute.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 20:15

Message:
Logged In: YES 
user_id=38376

"from * import types" is a rather common pydiom, and I'm 
pretty sure most people using that expects to get a bunch of 
[A-Z]\w+Type names, and nothing else.

-0 from me.

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-06-18 16:40

Message:
Logged In: YES 
user_id=562624

Updated patch.

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-06-15 12:58

Message:
Logged In: YES 
user_id=562624

http://mail.python.org/pipermail/python-dev/2002-June/025410.html


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-15 12:05

Message:
Logged In: YES 
user_id=21627

What is the purpose of this change?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569328&group_id=5470


From noreply@sourceforge.net  Mon Jul  1 20:23:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 12:23:00 -0700
Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning
Message-ID: <E17P6lI-0005k7-00@usw-sf-web3.sourceforge.net>

Patches item #576101, was opened at 2002-07-01 19:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: Alternative implementation of interning

Initial Comment:
An interned string has a flag set indicating that it is 
interned instead of a pointer to the interned string. This 
pointer was almost always either NULL or pointing to the 
same object. The other cases were rare and ineffective 
as an optimization.  This saves an average of 3 bytes 
per string.

Interned strings are no longer immortal.  They are 
automatically destroyed when there are no more 
references to them except the global dictionary of 
interned strings.

New function (actually a macro) PyString_CheckInterned 
to check whether a string is interned.  There are no 
more references to ob_sinterned anywhere outside 
stringobject.c.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 02:47:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 18:47:26 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17PClK-0005cZ-00@usw-sf-web2.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 15:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-02 04:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 13:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 15:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 05:21:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 01 Jul 2002 21:21:22 -0700
Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning
Message-ID: <E17PFAI-0007bZ-00@usw-sf-web2.sourceforge.net>

Patches item #576101, was opened at 2002-07-01 14:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: Alternative implementation of interning

Initial Comment:
An interned string has a flag set indicating that it is 
interned instead of a pointer to the interned string. This 
pointer was almost always either NULL or pointing to the 
same object. The other cases were rare and ineffective 
as an optimization.  This saves an average of 3 bytes 
per string.

Interned strings are no longer immortal.  They are 
automatically destroyed when there are no more 
references to them except the global dictionary of 
interned strings.

New function (actually a macro) PyString_CheckInterned 
to check whether a string is interned.  There are no 
more references to ob_sinterned anywhere outside 
stringobject.c.


----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2002-07-01 23:21

Message:
Logged In: YES 
user_id=80475

I like the way you consolidated all of the knowledge about 
interning into one place.

Consider adding an example to the docs of an effective use 
of interning for optimization.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 11:16:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 03:16:29 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17PKhx-00051P-00@usw-sf-web1.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 05:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-02 10:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 01:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 18:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 03:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 05:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 12:11:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 04:11:54 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PLZa-00067C-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Nobody/Anonymous (nobody)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 15:44:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 07:44:42 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17POtW-0000hc-00@usw-sf-web2.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 07:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Nobody/Anonymous (nobody)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-02 10:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 16:06:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 08:06:24 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PPEW-0001k1-00@usw-sf-web5.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Nobody/Anonymous (nobody)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 16:42:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 08:42:57 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PPnt-0003bs-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Nobody/Anonymous (nobody)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 16:47:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 08:47:28 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PPsG-0003gE-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Nobody/Anonymous (nobody)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 17:02:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 09:02:18 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17PQ6c-0004uM-00@usw-sf-web4.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Heller (theller)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 17:13:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 09:13:36 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17PQHY-000292-00@usw-sf-web2.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Heller (theller)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2002-07-02 18:13

Message:
Logged In: YES 
user_id=11105

Patch for the header file was missing...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 21:20:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 13:20:51 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PU8p-0000x2-00@usw-sf-web4.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 07:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
>Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-02 16:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 10:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 22:01:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 14:01:11 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PUlr-0006Ke-00@usw-sf-web2.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 12:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 22:41:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 14:41:41 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PVP3-0000Pe-00@usw-sf-web5.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:41

Message:
Logged In: YES 
user_id=119770

Ok, well, testing worked fine on the test file I created but
running against Lib/test/test_zipfile.py gives:
Traceback (most recent call last):
  File "test_zipfile.py", line 35, in ?
    zipTest(file, zipfile.ZIP_STORED, writtenData)
  File "test_zipfile.py", line 16, in zipTest
    readData2 = zip.read(srcname)
  File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py",
line 351, in read
    raise BadZipfile, "Bad CRC-32 for file %s" % name
zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 12:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 22:52:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 14:52:16 -0700
Subject: [Patches] [ python-Patches-553108 ] Deprecate bsddb
Message-ID: <E17PVZI-0007AN-00@usw-sf-web2.sourceforge.net>

Patches item #553108, was opened at 2002-05-07 05:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470

Category: Modules
Group: Python 2.3
>Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Garth T Kidd (gtk)
Assigned to: Skip Montanaro (montanaro)
Summary: Deprecate bsddb

Initial Comment:
Large numbers of inserts break bsddb, as first 
discovered in Python 1.5 (bug 408271). 

According to Barry Warsaw, "trying to get the bsddb 
module that comes with Python to work is a hopeless 
cause." 

If it's broken, let's discourage people from using it. 
In particular, let's ensure that people importing 
shelve or anydbm don't end up using it by default. 

The submitted patch adds a DeprecationWarning to the 
bsddb module and removes bsddb from the list of db 
module candidates in anydbm. 

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-02 23:52

Message:
Logged In: YES 
user_id=45365

Skip,
I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc).

Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-14 22:32

Message:
Logged In: YES 
user_id=44345

Implemented in
  setup.py 1.93
  README 1.147
  configure 1.315
  configure.in 1.325
  pyconfig.h.in 1.42
  Modules/dbmmodule 2.30


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-14 09:16

Message:
Logged In: YES 
user_id=21627

The patch looks good, please apply it.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-14 05:33

Message:
Logged In: YES 
user_id=44345

a couple more tweaks... I forgot to include dbmmodule.c in 
previous patches.  This version of the patch also includes a 
modified README file that adds a section about building the 
bsddb and dbm modules.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-13 09:35

Message:
Logged In: YES 
user_id=44345

Here's an updated patch.  It's different in a couple ways:

  * support for Berkeley DB 4.x was added.  You will need to
    configure iBerkdb with the 1.85 compatibility stuff.

  * I cleaned up the dbm build code a bit.

  * I added a diff for the configure file for people who don't
    have autoconf handy.

Skip


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-11 18:09

Message:
Logged In: YES 
user_id=44345

I think deprecating bsddb is too drastic.  In the first place, the problems
you refer to are in the underlying Berkeley DB library, not in the bsddb
code itself.  In the second place, later versions of the library fix the
problem.

The attached patch attempts to modify setup.py and configure.in to
solve the problem.  It does a couple things differently than the current
CVS version:

  1. It only searches for versions 2 and 3 of the Berkeley DB library by
   default.  People who know what they are doing can uncomment the
   information relevant to version 1.

  2. It moves all the checking code into setup.py.  The header file checks
  in configure.in were deleted.

  3. The ndbm lookalike stuff for the dbm module is done differently.  This
  has not really been tested yet.  I anticipate further changes will be
  necessary with this code.

I'm sure it's not perfect.  Please give it a try and let me know how it
works for you.

All that said, I think a better migration path is to replace the current
module with the bsddb3/pybsddb stuff.  I think that would effectively
restrict you to versions 3 or 4 of the underlying Berkeley DB library, so
it probably couldn't be done with impunity. 

Skip


----------------------------------------------------------------------

Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-20 20:14

Message:
Logged In: YES 
user_id=276840

#!/bin/python
# Test for Python bug report 553108
# This program shows that bsddb seems to work reliably with
# the btopen database format.

# This is based on the test program
# in the discussion of bug report 445862
# This has been enhanced to perform read, modify,
# write operations in random order.

# This is only one of several tests I performed.
# This included 4,000,000 read, modify, write operations to 
90,909 records
# (an average of 44,000 writes for each record).
# Note: This program took approximately 50 hours to run
# on my 930MHz Pentium 3 under Windows 2000 with
# ActiveState Python version 2.1.1 build 212
import unittest, sys, os, math, time

LIMIT=4000000
DISPLAY_AT_END=1

USE_RANDOM=100  # If set, number of keys is approximately 
LIMIT/USE_RANDOM
AUTO_RANDOM=1
if USE_RANDOM and AUTO_RANDOM:
    USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT)))
    if USE_RANDOM < 2:
        USE_RANDOM = 2
##  The format of the value string is
##      count|hash|hash...|b
##  Where
##      count is an 8 byte hexadecimal count of the number 
of times
##          this record has been written.
##      hash is the md5 hash of the random value that 
created this record.
##          It is the key for this record. It is appended 
once for each
##          time the record is written (that is, it occurs 
count times).
##      b is 129 '!'
## if USE_RANDOM is set, its value should be >= 2

class BreakDB(unittest.TestCase):
    def runTest(self):
        import md5, bsddb, os
        if USE_RANDOM:
            import random
            random.seed()
            max_key=int(LIMIT / USE_RANDOM)
        m = md5.new()
        b = "!" * 129       # small string to write
        db = bsddb.btopen(self.dbname, 'c')
        try:
            self.db = db
            for count in xrange(1, LIMIT+1):
                if count % 100==0:
                    print >> sys.stderr, " %10d\r" % 
(count),
                if USE_RANDOM:
                    r = random.randrange(0, max_key)
                    m = md5.new(str(r))
                    key = m.hexdigest()
                    if db.has_key(key):
                        rec = db[key]
                        old_count = int(rec[0:8], 16)
                        should_be = '%08X|%s%s'% (old_count,
                                                  ((key+'|')
*old_count), b)
                        if rec != should_be:
                            self.fail("Mismatched data: db
["+repr(key)+"]="+
                                repr(db[key])+". Should 
be "+repr(should_be))
                            return 1
                    else: # New record
                        rec = '00000000|'+b
                        old_count = 0
                    new_count = old_count+1
                    new_rec = '%08X|%s%s'% (new_count, key, 
rec[8:], )
                    db[key] = new_rec
                else:
                    m.update(str(count))
                    db[m.digest()] = b
            try:
                db.sync()
            except:
                pass
            if DISPLAY_AT_END:
                rec = db.first()
                count = 0
                while 1:
                    print >> sys.stderr, "  count = %6i db[%
s]=%s" % (
                        count, rec[0], rec[1], )
                    count += 1
                    try:
                        rec = db.next()
                    except KeyError:
                        break
        finally:
            db.close()

    def unlinkDB(self):
        import os
        if os.path.exists(self.dbname):
            os.unlink(self.dbname)

    def setUp(self):
        self.dbname = 'test.db'
        self.unlinkDB()

    def tearDown(self):
        self.db.close()
        self.unlinkDB()

if __name__ == '__main__':
    runner = unittest.TextTestRunner()
    runner.run(unittest.TestSuite([BreakDB()]))


----------------------------------------------------------------------

Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-17 01:10

Message:
Logged In: YES 
user_id=276840

I am not sure there is a reason to deprecate bsddb. The 
btopen format appears to be stable enough for normal work. 
Maybe 2.3 should change dbhash to use btopen?

----------------------------------------------------------------------

Comment By: Garth T Kidd (gtk)
Date: 2002-05-09 05:12

Message:
Logged In: YES 
user_id=59803

Let's not turn a simple patch into something requiring a 
PEP, compulsory thrashing on comp.lang.python, SleepyCat 
being willing to change their distribution model, lawyers 
(to make sure the licences are compatible), and so on. 

I'd hate it if other people spent the kind of time I did 
trying to get shelve to work only to find that a known-
broken bsddb was causing all the problems, and that a patch 
was there to gently guide them to gdbm, but it got jammed 
because of scope-creep. 

Let's get this one, very simple and necessary (bsddb IS 
broken) change out of the way, and THEN start negotiating, 
thrashing, and integrating. :) 

I firmly believe bsddb3 should be one of the included 
batteries. Let's do it, but let's guide people away from 
broken code first. 

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-08 11:01

Message:
Logged In: YES 
user_id=21627

I'm in favour of this change, but I'd like simultaneously
incorporate bsddb3.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 22:54:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 14:54:41 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PVbd-000172-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 07:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-02 17:54

Message:
Logged In: YES 
user_id=31435

So what did it get, and what did it expect?  I.e., same stuff all 
over again.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:41

Message:
Logged In: YES 
user_id=119770

Ok, well, testing worked fine on the test file I created but
running against Lib/test/test_zipfile.py gives:
Traceback (most recent call last):
  File "test_zipfile.py", line 35, in ?
    zipTest(file, zipfile.ZIP_STORED, writtenData)
  File "test_zipfile.py", line 16, in zipTest
    readData2 = zip.read(srcname)
  File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py",
line 351, in read
    raise BadZipfile, "Bad CRC-32 for file %s" % name
zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 16:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 10:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 23:17:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 15:17:37 -0700
Subject: [Patches] [ python-Patches-553108 ] Deprecate bsddb
Message-ID: <E17PVxp-0002tL-00@usw-sf-web4.sourceforge.net>

Patches item #553108, was opened at 2002-05-06 22:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470

Category: Modules
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Garth T Kidd (gtk)
Assigned to: Skip Montanaro (montanaro)
Summary: Deprecate bsddb

Initial Comment:
Large numbers of inserts break bsddb, as first 
discovered in Python 1.5 (bug 408271). 

According to Barry Warsaw, "trying to get the bsddb 
module that comes with Python to work is a hopeless 
cause." 

If it's broken, let's discourage people from using it. 
In particular, let's ensure that people importing 
shelve or anydbm don't end up using it by default. 

The submitted patch adds a DeprecationWarning to the 
bsddb module and removes bsddb from the list of db 
module candidates in anydbm. 

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-02 17:17

Message:
Logged In: YES 
user_id=44345

Jack,

Sorry to here you're having trouble.  Alas, my MacOS X system is with 
my wife at the moment, so I can't dig into the problem much.  Can you 
provide me with some background info?  If you can send me your copy 
of ndbm.h (I doubt it's using Berkeley DB) and figure out which library 
dbm_open resides in, that would be great.  Also, can you provide me 
with the output of the build process so I can see just what errors are 
being generated?

Skip


----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-02 16:52

Message:
Logged In: YES 
user_id=45365

Skip,
I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc).

Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-14 15:32

Message:
Logged In: YES 
user_id=44345

Implemented in
  setup.py 1.93
  README 1.147
  configure 1.315
  configure.in 1.325
  pyconfig.h.in 1.42
  Modules/dbmmodule 2.30


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-14 02:16

Message:
Logged In: YES 
user_id=21627

The patch looks good, please apply it.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-13 22:33

Message:
Logged In: YES 
user_id=44345

a couple more tweaks... I forgot to include dbmmodule.c in 
previous patches.  This version of the patch also includes a 
modified README file that adds a section about building the 
bsddb and dbm modules.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-13 02:35

Message:
Logged In: YES 
user_id=44345

Here's an updated patch.  It's different in a couple ways:

  * support for Berkeley DB 4.x was added.  You will need to
    configure iBerkdb with the 1.85 compatibility stuff.

  * I cleaned up the dbm build code a bit.

  * I added a diff for the configure file for people who don't
    have autoconf handy.

Skip


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-11 11:09

Message:
Logged In: YES 
user_id=44345

I think deprecating bsddb is too drastic.  In the first place, the problems
you refer to are in the underlying Berkeley DB library, not in the bsddb
code itself.  In the second place, later versions of the library fix the
problem.

The attached patch attempts to modify setup.py and configure.in to
solve the problem.  It does a couple things differently than the current
CVS version:

  1. It only searches for versions 2 and 3 of the Berkeley DB library by
   default.  People who know what they are doing can uncomment the
   information relevant to version 1.

  2. It moves all the checking code into setup.py.  The header file checks
  in configure.in were deleted.

  3. The ndbm lookalike stuff for the dbm module is done differently.  This
  has not really been tested yet.  I anticipate further changes will be
  necessary with this code.

I'm sure it's not perfect.  Please give it a try and let me know how it
works for you.

All that said, I think a better migration path is to replace the current
module with the bsddb3/pybsddb stuff.  I think that would effectively
restrict you to versions 3 or 4 of the underlying Berkeley DB library, so
it probably couldn't be done with impunity. 

Skip


----------------------------------------------------------------------

Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-20 13:14

Message:
Logged In: YES 
user_id=276840

#!/bin/python
# Test for Python bug report 553108
# This program shows that bsddb seems to work reliably with
# the btopen database format.

# This is based on the test program
# in the discussion of bug report 445862
# This has been enhanced to perform read, modify,
# write operations in random order.

# This is only one of several tests I performed.
# This included 4,000,000 read, modify, write operations to 
90,909 records
# (an average of 44,000 writes for each record).
# Note: This program took approximately 50 hours to run
# on my 930MHz Pentium 3 under Windows 2000 with
# ActiveState Python version 2.1.1 build 212
import unittest, sys, os, math, time

LIMIT=4000000
DISPLAY_AT_END=1

USE_RANDOM=100  # If set, number of keys is approximately 
LIMIT/USE_RANDOM
AUTO_RANDOM=1
if USE_RANDOM and AUTO_RANDOM:
    USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT)))
    if USE_RANDOM < 2:
        USE_RANDOM = 2
##  The format of the value string is
##      count|hash|hash...|b
##  Where
##      count is an 8 byte hexadecimal count of the number 
of times
##          this record has been written.
##      hash is the md5 hash of the random value that 
created this record.
##          It is the key for this record. It is appended 
once for each
##          time the record is written (that is, it occurs 
count times).
##      b is 129 '!'
## if USE_RANDOM is set, its value should be >= 2

class BreakDB(unittest.TestCase):
    def runTest(self):
        import md5, bsddb, os
        if USE_RANDOM:
            import random
            random.seed()
            max_key=int(LIMIT / USE_RANDOM)
        m = md5.new()
        b = "!" * 129       # small string to write
        db = bsddb.btopen(self.dbname, 'c')
        try:
            self.db = db
            for count in xrange(1, LIMIT+1):
                if count % 100==0:
                    print >> sys.stderr, " %10d\r" % 
(count),
                if USE_RANDOM:
                    r = random.randrange(0, max_key)
                    m = md5.new(str(r))
                    key = m.hexdigest()
                    if db.has_key(key):
                        rec = db[key]
                        old_count = int(rec[0:8], 16)
                        should_be = '%08X|%s%s'% (old_count,
                                                  ((key+'|')
*old_count), b)
                        if rec != should_be:
                            self.fail("Mismatched data: db
["+repr(key)+"]="+
                                repr(db[key])+". Should 
be "+repr(should_be))
                            return 1
                    else: # New record
                        rec = '00000000|'+b
                        old_count = 0
                    new_count = old_count+1
                    new_rec = '%08X|%s%s'% (new_count, key, 
rec[8:], )
                    db[key] = new_rec
                else:
                    m.update(str(count))
                    db[m.digest()] = b
            try:
                db.sync()
            except:
                pass
            if DISPLAY_AT_END:
                rec = db.first()
                count = 0
                while 1:
                    print >> sys.stderr, "  count = %6i db[%
s]=%s" % (
                        count, rec[0], rec[1], )
                    count += 1
                    try:
                        rec = db.next()
                    except KeyError:
                        break
        finally:
            db.close()

    def unlinkDB(self):
        import os
        if os.path.exists(self.dbname):
            os.unlink(self.dbname)

    def setUp(self):
        self.dbname = 'test.db'
        self.unlinkDB()

    def tearDown(self):
        self.db.close()
        self.unlinkDB()

if __name__ == '__main__':
    runner = unittest.TextTestRunner()
    runner.run(unittest.TestSuite([BreakDB()]))


----------------------------------------------------------------------

Comment By: Martin D Katz, Ph.D. (drbits)
Date: 2002-05-16 18:10

Message:
Logged In: YES 
user_id=276840

I am not sure there is a reason to deprecate bsddb. The 
btopen format appears to be stable enough for normal work. 
Maybe 2.3 should change dbhash to use btopen?

----------------------------------------------------------------------

Comment By: Garth T Kidd (gtk)
Date: 2002-05-08 22:12

Message:
Logged In: YES 
user_id=59803

Let's not turn a simple patch into something requiring a 
PEP, compulsory thrashing on comp.lang.python, SleepyCat 
being willing to change their distribution model, lawyers 
(to make sure the licences are compatible), and so on. 

I'd hate it if other people spent the kind of time I did 
trying to get shelve to work only to find that a known-
broken bsddb was causing all the problems, and that a patch 
was there to gently guide them to gdbm, but it got jammed 
because of scope-creep. 

Let's get this one, very simple and necessary (bsddb IS 
broken) change out of the way, and THEN start negotiating, 
thrashing, and integrating. :) 

I firmly believe bsddb3 should be one of the included 
batteries. Let's do it, but let's guide people away from 
broken code first. 

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-08 04:01

Message:
Logged In: YES 
user_id=21627

I'm in favour of this change, but I'd like simultaneously
incorporate bsddb3.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553108&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 23:25:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 15:25:09 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PW57-0001gs-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 07:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-02 18:25

Message:
Logged In: YES 
user_id=31435

Please try again.  New patch tries to force the entry 
conditions in crc32(), as well as the return value.

Modules/binascii.c; new revision: 2.36

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 17:54

Message:
Logged In: YES 
user_id=31435

So what did it get, and what did it expect?  I.e., same stuff all 
over again.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:41

Message:
Logged In: YES 
user_id=119770

Ok, well, testing worked fine on the test file I created but
running against Lib/test/test_zipfile.py gives:
Traceback (most recent call last):
  File "test_zipfile.py", line 35, in ?
    zipTest(file, zipfile.ZIP_STORED, writtenData)
  File "test_zipfile.py", line 16, in zipTest
    readData2 = zip.read(srcname)
  File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py",
line 351, in read
    raise BadZipfile, "Bad CRC-32 for file %s" % name
zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 16:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 10:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Tue Jul  2 23:41:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 15:41:03 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PWKV-0001xh-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 14:41

Message:
Logged In: YES 
user_id=119770

Ok, hang on. I'm doing a clean build to make sure I wasn't
using anything from an old install.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 14:25

Message:
Logged In: YES 
user_id=31435

Please try again.  New patch tries to force the entry 
conditions in crc32(), as well as the return value.

Modules/binascii.c; new revision: 2.36

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 13:54

Message:
Logged In: YES 
user_id=31435

So what did it get, and what did it expect?  I.e., same stuff all 
over again.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:41

Message:
Logged In: YES 
user_id=119770

Ok, well, testing worked fine on the test file I created but
running against Lib/test/test_zipfile.py gives:
Traceback (most recent call last):
  File "test_zipfile.py", line 35, in ?
    zipTest(file, zipfile.ZIP_STORED, writtenData)
  File "test_zipfile.py", line 16, in zipTest
    readData2 = zip.read(srcname)
  File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py",
line 351, in read
    raise BadZipfile, "Bad CRC-32 for file %s" % name
zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 12:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Wed Jul  3 02:30:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 18:30:38 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PYyc-0004TY-00@usw-sf-web1.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 03:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:30

Message:
Logged In: YES 
user_id=119770

Ok, Modules/binascii.c v2.36 works good!

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 14:41

Message:
Logged In: YES 
user_id=119770

Ok, hang on. I'm doing a clean build to make sure I wasn't
using anything from an old install.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 14:25

Message:
Logged In: YES 
user_id=31435

Please try again.  New patch tries to force the entry 
conditions in crc32(), as well as the return value.

Modules/binascii.c; new revision: 2.36

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 13:54

Message:
Logged In: YES 
user_id=31435

So what did it get, and what did it expect?  I.e., same stuff all 
over again.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:41

Message:
Logged In: YES 
user_id=119770

Ok, well, testing worked fine on the test file I created but
running against Lib/test/test_zipfile.py gives:
Traceback (most recent call last):
  File "test_zipfile.py", line 35, in ?
    zipTest(file, zipfile.ZIP_STORED, writtenData)
  File "test_zipfile.py", line 16, in zipTest
    readData2 = zip.read(srcname)
  File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py",
line 351, in read
    raise BadZipfile, "Bad CRC-32 for file %s" % name
zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 13:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 12:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 07:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 06:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Wed Jul  3 02:58:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 18:58:09 -0700
Subject: [Patches] [ python-Patches-576327 ] zipfile when sizeof(long) == 8
Message-ID: <E17PZPF-000297-00@usw-sf-web3.sourceforge.net>

Patches item #576327, was opened at 2002-07-02 07:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: The Written Word (Albert Chin) (tww-china)
Assigned to: Tim Peters (tim_one)
Summary: zipfile when sizeof(long) == 8

Initial Comment:
This bug also applies to Python 2.0.x and 2.1.x (most
likely every version).

When sizeof (long) == 8, like on Tru64 UNIX,
zipfile.testzip () fails due to a CRC error. The
problem is that in Lib/zipfile.py:
  crc = binascii.crc32(bytes)
converts the 32-bit binascii.crc32() return value to a
64-bit value (crc). We need to force crc to remain a
32-bit value. Attached is a patch though maybe someone
else can think of something better.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-02 21:58

Message:
Logged In: YES 
user_id=31435

Thanks for your help, Albert!  While I started my ill-spent 
computer career on 64-bit Crays, you're the only 64-bit 
platform I have anymore <wink>.

This report is Closed.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 21:30

Message:
Logged In: YES 
user_id=119770

Ok, Modules/binascii.c v2.36 works good!

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 18:41

Message:
Logged In: YES 
user_id=119770

Ok, hang on. I'm doing a clean build to make sure I wasn't
using anything from an old install.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 18:25

Message:
Logged In: YES 
user_id=31435

Please try again.  New patch tries to force the entry 
conditions in crc32(), as well as the return value.

Modules/binascii.c; new revision: 2.36

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 17:54

Message:
Logged In: YES 
user_id=31435

So what did it get, and what did it expect?  I.e., same stuff all 
over again.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:41

Message:
Logged In: YES 
user_id=119770

Ok, well, testing worked fine on the test file I created but
running against Lib/test/test_zipfile.py gives:
Traceback (most recent call last):
  File "test_zipfile.py", line 35, in ?
    zipTest(file, zipfile.ZIP_STORED, writtenData)
  File "test_zipfile.py", line 16, in zipTest
    readData2 = zip.read(srcname)
  File "/opt/TWWfsw/python221/lib/python2.2/zipfile.py",
line 351, in read
    raise BadZipfile, "Bad CRC-32 for file %s" % name
zipfile.BadZipfile: Bad CRC-32 for file junk9630.tmp

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 17:01

Message:
Logged In: YES 
user_id=119770

Tested the new Modules/binascii.c against 2.2.1 on Tru64
4.0D, 5.1, and HP-UX 11i and it works. Thanks!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 16:20

Message:
Logged In: YES 
user_id=31435

No, I don't have access to a 64-bit box.

Do you have access to CVS Python?  If so, please try again.  
I patched it to try to make binascii.crc32() return the same 
result across platforms.

Modules/binascii.c; new revision: 2.35

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=119770

>From zipfile.py:
  ...
  structCentralDir = "<4s4B4H3l5H2l"
  ...
  def _RealGetContents(self):
    ...
            centdir = fp.read(46)
            total = total + 46
            if centdir[0:4] != stringCentralDir:
                raise BadZipfile, "Bad magic number for
central directory"
            centdir = struct.unpack(structCentralDir, centdir)

When a zipfile is created, the CRC is written with:
  def write(self, filename, arcname=None, compress_type=None):
    ...
        self.fp.write(struct.pack("<lll", zinfo.CRC,
zinfo.compress_size,
              zinfo.file_size))

Changing the "3l" to "3L" or "3I" in structCentralDir is
another workaround but as we wrote with "l", we should also
read with "l" (maybe this is the real problem).

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:42

Message:
Logged In: YES 
user_id=119770

Bug #453208 indicates a similar problem.

----------------------------------------------------------------------

Comment By: The Written Word (Albert Chin) (tww-china)
Date: 2002-07-02 11:06

Message:
Logged In: YES 
user_id=119770

Do you have access to a machine where sizeof (long) == 8?
Here's what I'm getting:

$ uname -a
OSF1 duh V4.0 878 alpha
$ python
>>> import zipfile
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'w')
>>> zip.write ('/vmuniz', 'vmunix')
>>> zip.close ()
>>> zip = zipfile.ZipFile ('/tmp/a.zip', 'r')
>>> zip.testzip()
2226205591 -2068761705

I addes some debugging statements to zipfile.read(). The
first number is the output of binascii.crc32() while the
second is the output of zinfo.CRC (the CRC value in the
zipfile header for 'vmuniz' in /tmp/a.zip).

Would binascii.crc32() *ever* return a negative number or
does it return an unsigned type? Looking at the source to
Modules/binascii.c, crc is an unsigned long but the value
returned is signed long.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-02 10:44

Message:
Logged In: YES 
user_id=31435

I believe you're having a problem, but I can't tell what it is.  
Exactly how does zipfile.testzip() fail?  What did it get and 
what did it expect?

It's not possible to "force crc to remain a 32-bit value" on a 64-
bit box with sizeof(long)==8 -- Python doesn't have any 32-bit 
type on such a box.  So it seems most likely that some 32-
bit value either is or isn't getting sign-extended when this 
fails, but I can't tell from the report which of the disagreeing 
values that may be, or which it *should* be.

IOW, we need more info about how this fails.  If you're 
hacking the result of binascii.crc32() and calling that "a fix", 
chances seem high that the correct fix lies in changing what 
crc32() returns.  But not yet enough info here to say.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576327&group_id=5470


From noreply@sourceforge.net  Wed Jul  3 03:47:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 02 Jul 2002 19:47:17 -0700
Subject: [Patches] [ python-Patches-574532 ] Update freeze to use zlib 1.1.4
Message-ID: <E17PaAn-0002jA-00@usw-sf-web3.sourceforge.net>

Patches item #574532, was opened at 2002-06-27 21:30
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470

Category: Demos and tools
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Lawrence Hudson (lhudson)
Assigned to: Nobody/Anonymous (nobody)
Summary: Update freeze to use zlib 1.1.4

Initial Comment:
freeze currently looks for zlib 1.1.3.


----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-03 12:47

Message:
Logged In: YES 
user_id=14198

Checked in.

/cvsroot/python/python/dist/src/Tools/freeze/extensions_win32.ini,v
 <--  extensions_win32.ini
new revision: 1.7; previous revision: 1.6


----------------------------------------------------------------------

Comment By: Lawrence Hudson (lhudson)
Date: 2002-07-01 18:59

Message:
Logged In: YES 
user_id=82888

D'Oh!  Sorry about that.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-28 11:14

Message:
Logged In: YES 
user_id=14198

there is no patch attached here that I can see!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574532&group_id=5470


From noreply@sourceforge.net  Wed Jul  3 16:57:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 03 Jul 2002 08:57:21 -0700
Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS
Message-ID: <E17PmVN-00034B-00@usw-sf-web4.sourceforge.net>

Patches item #577031, was opened at 2002-07-03 11:57
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Remove PyArg_Parse() and METH_OLDARGS

Initial Comment:
This patch removes more PyArg_Parse() and METH_OLDARGS
which are deprecated.
I've tested in select and string, but want to make sure
there's nothing else I'm missing.

I also have a huge change to glmodule, but I can't test
that.  The diff is attached.
Let me know if I should check in glmodule or leave it
alone.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470


From noreply@sourceforge.net  Wed Jul  3 16:59:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 03 Jul 2002 08:59:15 -0700
Subject: [Patches] [ python-Patches-561244 ] Micro optimizations
Message-ID: <E17PmXD-00036J-00@usw-sf-web4.sourceforge.net>

Patches item #561244, was opened at 2002-05-27 17:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561244&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neal Norwitz (nnorwitz)
Summary: Micro optimizations

Initial Comment:
This is stuff I've had sitting around for a while.
I was attempting to improve performance in
some paths.

 * Most of the changes are from a loop -> memset.
 * intobject changes are to initialize small ints at
startup,
   so smallints don't have to be checked for each new int
 * other misc very small clean-ups

Please review and test to see if there are any
problems.  Also feedback whether this improves
performance for various platforms (tested on Linux)
or if this patch is even worth it.

Files modified are:  Include/intobject.h
Python/{ceval,pythonrun}.c
Objects/{tuple,list,int,frame,}object.c

All changes are independant, except for the int changes
which affect:  Include/intobject.h, Python/pythonrun.c,
and Objects/intobject.c.
It may also be useful to define the small negative int
(NSMALLNEGINTS) to be 5 or so instead of 1.  There are
several uses -2, -3, ... in the standard library.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-03 11:59

Message:
Logged In: YES 
user_id=33168

Checked in the memset()s in:
  {list,tuple}object.c and _sre.c.
object.c 2.178

Still have to do int and frame.
I've cleaned up int so that if there is an init
failure, a fatal error is raised similar to other
initializations.  I will get around to checking that in.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-05 17:43

Message:
Logged In: YES 
user_id=6380

I like all of these, even PyInt_Init(). Go for it.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-31 18:11

Message:
Logged In: YES 
user_id=80475

Wow, you plowed through a lot of code!

Two sets of optimizations look worthwhile, the memsets() 
and the XINCREFs to INCREFS.

Probably the fastlocals substitutions should be done also, 
but more for beauty and clarity than speed.

I checked those three categories of changes on my 
machine.  They compile fine, pass the standard regression 
tests and checkout okay on my personal, realcode 
testfarm.

I don't think the PyInt_Init() addition buys us anything.  
The register and macro tweaks may cost more in review 
time and potential errors than they could ever save in 
cumulative computer time.

Recommend you get these in before someone changes the 
codebase.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561244&group_id=5470


From noreply@sourceforge.net  Thu Jul  4 02:35:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 03 Jul 2002 18:35:46 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17PvX8-0008DH-00@usw-sf-web2.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 15:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-04 11:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 20:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-02 04:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 13:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 15:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Thu Jul  4 13:28:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 04 Jul 2002 05:28:14 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17Q5iY-0008F0-00@usw-sf-web5.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 05:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-04 12:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-04 01:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 10:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 01:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 18:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 03:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 05:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 06:31:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 04 Jul 2002 22:31:36 -0700
Subject: [Patches] [ python-Patches-553702 ] Cygwin make install patch
Message-ID: <E17QLgu-0004g8-00@usw-sf-web4.sourceforge.net>

Patches item #553702, was opened at 2002-05-08 14:44
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470

Category: Build
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Jason Tishler (jlt63)
>Assigned to: Jason Tishler (jlt63)
Summary: Cygwin make install patch

Initial Comment:
This patch fixes make install for Cygwin. Specifically, it reverts
to the previous behavior:

o install libpython$(VERSION)$(SO) in $(BINDIR)
o install $(LDLIBRARY) in $(LIBPL)

It also begins to remove Cygwin's dependency on
$(DLLLIBRARY) which I hope to take advantage of
when I attempt to make Cygwin as similar as possible
to the other Unix platforms (in other patches).

I tested this patch under Red Hat Linux 7.1 without
any ill effects.

BTW, I'm not the happiest using the following
test for Cygwin:

test "$(SO)" = .dll

I'm willing to update the patch to use:

case "$(MACHDEP)" in cygwin*

instead, but IMO that will look uglier.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:31

Message:
Logged In: YES 
user_id=21627

I think I misinterpreted your patch. It is fine; please
apply it.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-06-27 18:25

Message:
Logged In: YES 
user_id=86216

Sorry for sluggish response time...

Under Cygwin, my patch does the following:

make altbininstall:
/usr/bin/install -c -m 555 libpython2.3.dll /usr/bin

make libainstall:
/usr/bin/install -c -m 644 libpython2.3.dll.a /usr/lib/python2.3/config

So, I am installing the shared library during altbininstall
and the import library during libainstall. Isn't this what
you were asking for in your previous message? Or, do
you want me to install both files during altbininstall?

I'm confused.  Please clarify.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-06 11:49

Message:
Logged In: YES 
user_id=21627

On Unix, if a shared libpython is created, it is installed
as part of altbininstall, not as part of libainstall. I feel
that pythonxy.dll is not really a library, but a binary -
quite unlike libpythonxy.a (which is more close to the
import library). So I feel that this patch would better be
incorporated into altbininstall.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-06-04 17:17

Message:
Logged In: YES 
user_id=86216

Please review when you get a chance, thanks.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-05-22 18:30

Message:
Logged In: YES 
user_id=86216

Can I commit this one? Note that make install is
busted under Cygwin without this patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 06:45:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 04 Jul 2002 22:45:21 -0700
Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS
Message-ID: <E17QLuD-0000pv-00@usw-sf-web5.sourceforge.net>

Patches item #577031, was opened at 2002-07-03 17:57
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Remove PyArg_Parse() and METH_OLDARGS

Initial Comment:
This patch removes more PyArg_Parse() and METH_OLDARGS
which are deprecated.
I've tested in select and string, but want to make sure
there's nothing else I'm missing.

I also have a huge change to glmodule, but I can't test
that.  The diff is attached.
Let me know if I should check in glmodule or leave it
alone.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:45

Message:
Logged In: YES 
user_id=21627

The changes look good, except for the ones that change
parsing of "s" to PyString_Check: that means to lose support
for Unicode.

For some of these methods, that may be acceptable, but that
would need documentation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 06:47:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 04 Jul 2002 22:47:52 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17QLwe-0004pl-00@usw-sf-web4.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Heller (theller)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:47

Message:
Logged In: YES 
user_id=21627

If this is meant to be used by extension modules, it should
be documented.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-02 18:13

Message:
Logged In: YES 
user_id=11105

Patch for the header file was missing...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 07:45:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 04 Jul 2002 23:45:22 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17QMqI-0007fG-00@usw-sf-web2.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 15:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 16:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 22:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-04 11:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 20:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-02 04:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 13:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 15:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 07:57:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 04 Jul 2002 23:57:50 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17QN2M-00027D-00@usw-sf-web5.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Heller (theller)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2002-07-05 08:57

Message:
Logged In: YES 
user_id=11105

Sure. Patch uploaded: docpatch.diff

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:47

Message:
Logged In: YES 
user_id=21627

If this is meant to be used by extension modules, it should
be documented.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-02 18:13

Message:
Logged In: YES 
user_id=11105

Patch for the header file was missing...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 15:36:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 05 Jul 2002 07:36:36 -0700
Subject: [Patches] [ python-Patches-553702 ] Cygwin make install patch
Message-ID: <E17QUCK-0005y9-00@usw-sf-web4.sourceforge.net>

Patches item #553702, was opened at 2002-05-08 04:44
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470

Category: Build
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Jason Tishler (jlt63)
Assigned to: Jason Tishler (jlt63)
Summary: Cygwin make install patch

Initial Comment:
This patch fixes make install for Cygwin. Specifically, it reverts
to the previous behavior:

o install libpython$(VERSION)$(SO) in $(BINDIR)
o install $(LDLIBRARY) in $(LIBPL)

It also begins to remove Cygwin's dependency on
$(DLLLIBRARY) which I hope to take advantage of
when I attempt to make Cygwin as similar as possible
to the other Unix platforms (in other patches).

I tested this patch under Red Hat Linux 7.1 without
any ill effects.

BTW, I'm not the happiest using the following
test for Cygwin:

test "$(SO)" = .dll

I'm willing to update the patch to use:

case "$(MACHDEP)" in cygwin*

instead, but IMO that will look uglier.


----------------------------------------------------------------------

>Comment By: Jason Tishler (jlt63)
Date: 2002-07-05 06:36

Message:
Logged In: YES 
user_id=86216

Thanks. I'm on vacation now and will check it in
when I return to work next week.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-04 21:31

Message:
Logged In: YES 
user_id=21627

I think I misinterpreted your patch. It is fine; please
apply it.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-06-27 08:25

Message:
Logged In: YES 
user_id=86216

Sorry for sluggish response time...

Under Cygwin, my patch does the following:

make altbininstall:
/usr/bin/install -c -m 555 libpython2.3.dll /usr/bin

make libainstall:
/usr/bin/install -c -m 644 libpython2.3.dll.a /usr/lib/python2.3/config

So, I am installing the shared library during altbininstall
and the import library during libainstall. Isn't this what
you were asking for in your previous message? Or, do
you want me to install both files during altbininstall?

I'm confused.  Please clarify.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-06 01:49

Message:
Logged In: YES 
user_id=21627

On Unix, if a shared libpython is created, it is installed
as part of altbininstall, not as part of libainstall. I feel
that pythonxy.dll is not really a library, but a binary -
quite unlike libpythonxy.a (which is more close to the
import library). So I feel that this patch would better be
incorporated into altbininstall.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-06-04 07:17

Message:
Logged In: YES 
user_id=86216

Please review when you get a chance, thanks.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-05-22 08:30

Message:
Logged In: YES 
user_id=86216

Can I commit this one? Note that make install is
busted under Cygwin without this patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 19:25:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 05 Jul 2002 11:25:01 -0700
Subject: [Patches] [ python-Patches-577875 ] Merge xrange() into slice()
Message-ID: <E17QXlN-00053f-00@usw-sf-web3.sourceforge.net>

Patches item #577875, was opened at 2002-07-05 18:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: Merge xrange() into slice()

Initial Comment:
Changes from Raymond Hettinger's last version of this 
patch:

1. Removed #include "rangeobject.h" from Python.h

2. Changed repr to suppress None arguments so it now 
looks like the old xrange repr.

3. Added .slice(len) method that exposes the functionality 
of PySlice_GetIndicesEx.

Comment in PySlice_GetIndicesEx:
/* this is harder to get right than you might think */

:-)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470


From noreply@sourceforge.net  Fri Jul  5 19:45:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 05 Jul 2002 11:45:05 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17QY4n-0005Ss-00@usw-sf-web3.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 01:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-05 14:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 02:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 08:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-03 21:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 06:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-01 21:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 14:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-23 23:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 01:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Sat Jul  6 01:41:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 05 Jul 2002 17:41:19 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17QddX-0006eI-00@usw-sf-web4.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 15:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-06 10:41

Message:
Logged In: YES 
user_id=14198

My patch is after Martin's so hopefully I have the macros
correct (or at least haven't regressed anything of his!)

DL_*PORT still exists, but is deprecated.  Eventually every
header will change, but for now DL_*PORT still works as before.

And yes, finding autoconf-2.5.3 for my cygwin and linux
platforms is what took 1/2 the time of getting this patch
together :)

Another report of success on Linux would be great!  To date,
I have not heard of a single person trying this patch on any
platform.

Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-06 04:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 16:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 22:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-04 11:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 20:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-02 04:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 13:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 15:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Sat Jul  6 15:35:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 06 Jul 2002 07:35:29 -0700
Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning
Message-ID: <E17Qqen-0002de-00@usw-sf-web2.sourceforge.net>

Patches item #576101, was opened at 2002-07-01 19:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: Alternative implementation of interning

Initial Comment:
An interned string has a flag set indicating that it is 
interned instead of a pointer to the interned string. This 
pointer was almost always either NULL or pointing to the 
same object. The other cases were rare and ineffective 
as an optimization.  This saves an average of 3 bytes 
per string.

Interned strings are no longer immortal.  They are 
automatically destroyed when there are no more 
references to them except the global dictionary of 
interned strings.

New function (actually a macro) PyString_CheckInterned 
to check whether a string is interned.  There are no 
more references to ob_sinterned anywhere outside 
stringobject.c.


----------------------------------------------------------------------

>Comment By: Oren Tirosh (orenti)
Date: 2002-07-06 14:35

Message:
Logged In: YES 
user_id=562624

This implementation supports both mortal and immortal interned 
strings.

PyString_InternInPlace creates an immortal interned string for 
backward compatibility with code that relies on this behavior.

PyString_Intern creates a mortal interned string that is 
deallocated when its refcnt reaches 0.  Note that if the string 
value has been previously interned as immortal this will not 
make it mortal.

Most places in the interpreter were changed to PyString_Intern 
except those that may be required for compatibility.

This version of the patch, like the previous one, disables 
indirect interning. Is there any evidence that it is still an 
important optimization for some packages?

Make sure you rebuild everything after applying this patch 
because it modifies the size of string object headers.


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-07-02 04:21

Message:
Logged In: YES 
user_id=80475

I like the way you consolidated all of the knowledge about 
interning into one place.

Consider adding an example to the docs of an effective use 
of interning for optimization.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470


From noreply@sourceforge.net  Sat Jul  6 16:08:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 06 Jul 2002 08:08:14 -0700
Subject: [Patches] [ python-Patches-527518 ] urllib2.py: fix behavior with proxies
Message-ID: <E17QrAU-00046t-00@usw-sf-web3.sourceforge.net>

Patches item #527518, was opened at 2002-03-08 13:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Chris Lawrence (lordsutch)
Assigned to: Moshe Zadka (moshez)
Summary: urllib2.py: fix behavior with proxies

Initial Comment:
The following patch against Python 2.1 fixes some
problems with the
urllib2 module when used with proxies; in particular, if
$http_proxy="http://user:passwd@host:port/" is used. 
It also
generates the correct Host header for proxy requests
(some proxies,
such as oops, get confused otherwise, despite RFC 2616
section 5.2
which says they are to ignore it in the case of a full
URL on the
request line).


----------------------------------------------------------------------

>Comment By: Chris Lawrence (lordsutch)
Date: 2002-07-06 10:08

Message:
Logged In: YES 
user_id=6757

Moshe: The updated patch seems to be A-OK and fixes the
issue in urllib2.py.  At some point I'll have to get back to
urllib.py.

Chris

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2002-06-18 02:40

Message:
Logged In: YES 
user_id=11645

I've looked at the patch, and it mixes cleanup with fixes. I
removed the cleanups parts, since I want
an "obviously correct" patch. Attached is a new patch I
generated which fixes the two problems:

* incorrect quoting of the user/password in the proxy code

* bad host headers when using proxies.

I am also curious about the logic in the later fix. Can
"sel_host" ever be empty? When? Or can we
just remove the "or host" stuff?

Thanks.

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2002-06-13 13:04

Message:
Logged In: YES 
user_id=11645

Nope, no reason, except I need to properly test it
and check it in, and I won't have time for that until
the weekend.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-13 12:51

Message:
Logged In: YES 
user_id=31392

This patch vs. CVS HEAD looks good to me.

Note that it would be better to get the Host header by
upgrading urllib2 to use HTTPConnection instead of HTTP, but
that's a much bigger project.  Would it be a problem to
always send HTTP/1.1 requests -- even to 1.0 servers?

Any reason not to check it in Moshe?


----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-13 09:24

Message:
Logged In: YES 
user_id=6757

I'll try to make these changes sometime over the next few
days; of course, if someone else wants to do it sooner &
check it in, they're more than welcome.

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-06-13 04:45

Message:
Logged In: YES 
user_id=9205

I testet the urllib.py patches for 2.1 and 2.2, they work.
Some minor quibbles are left:
a) the user and/or password may be empty, so your test "if
proxypass and proxyuser" is not enough. You should test
against "is None".
b) in the urllib2 patches, you use unquote() for user and
pass, but in the urllib patches you dont. You should use
unquote in both modules.
c) in urllib2 patch, you use encodestring() without strip()

Here is an example that catches the corner cases
# http://@host.com (empty user and password)
# http://:@host.com (empty user and password)
# http://user@host.com (empty password)
# http://user:@host.com (empty password)
# http://:pass@host.com (empty user)
proxyuserpass, host = splithost(host)
if proxyuserpass is not None:
....# unquote
....proxyuserpass = unquote(proxyuserpass)
....# add empty password if missing
....if ":" not in proxyuserpass: proxyuserpass += ":"
....# base64 
....proxyuserpass = base64.encodestring(proxyuserpass).strip()
....req.add_header("Proxy-Authorization", "Basic
"+proxyuserpass)


Greetings, Bastian

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-12 22:17

Message:
Logged In: YES 
user_id=6757

Ok, here's the patch for urllib.py; again, one patch for
each of 2.1, 2.2 and CVS HEAD.  I also moved the Host header
to right after the GET/PUT request line; this should help
servers that have multiple virtual hosts handle requests
more efficiently.

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-12 21:39

Message:
Logged In: YES 
user_id=6757

Ok, I've cleaned up the patch a bit.  I've got versions for
2.1, 2.2 and current CVS HEAD; they're all the same
substantively, but the 2.2 -> 2.3 jump changed things enough
that the 2.2 patch won't apply cleanly to CVS.

Note that the first big chunk fixes the proxy authentication
problem, while the second chunk fixes the incorrect Host
header problem.  The changes to the import at the beginning
are necessary for either part to work.

I'll investigate urllib.py further.  It looks like the
underlying problem is fixed in CVS HEAD already, but I'll
try to confirm after setting up some test code for urllib.

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-12 19:54

Message:
Logged In: YES 
user_id=6757

Moshe, Calvin:

I'll see about reworking the patch against current CVS and
using splituser etc.  I can break it up into two bits if you
like, too; probably cleaner that way.  (Have I mentioned how
much I hate fooling with SF.net's BTS... give me debbugs any
day :-)

Chris

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-06-12 11:41

Message:
Logged In: YES 
user_id=9205

Note that the proxy thing is also a bug in urllib.py.
Chris, can you supply a patch for urllib.py too?

And I dont like the attached patch because it does not use the
splituser and splitpasswd functions already in urllib. I
would suggest
that you use something like
proxyuser, host = splituser(host)
if proxyuser is not None:
....proxypass, proxyuser = splitpasswd(proxyuser)
....[base64 encode and add header]

Chris, if you are too busy, close this patch and I will open
a new bug with a revised patch.

So long, Bastian

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2002-06-11 05:34

Message:
Logged In: YES 
user_id=11645

I want to take a look at this....I'm not thrilled about the
patch, especially solving two unrelated
problems and all, but I do think there's a real problem, and
I'll try to fix it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470


From noreply@sourceforge.net  Sat Jul  6 17:08:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 06 Jul 2002 09:08:16 -0700
Subject: [Patches] [ python-Patches-576101 ] Alternative implementation of interning
Message-ID: <E17Qs6a-0003CA-00@usw-sf-web5.sourceforge.net>

Patches item #576101, was opened at 2002-07-01 19:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: Alternative implementation of interning

Initial Comment:
An interned string has a flag set indicating that it is 
interned instead of a pointer to the interned string. This 
pointer was almost always either NULL or pointing to the 
same object. The other cases were rare and ineffective 
as an optimization.  This saves an average of 3 bytes 
per string.

Interned strings are no longer immortal.  They are 
automatically destroyed when there are no more 
references to them except the global dictionary of 
interned strings.

New function (actually a macro) PyString_CheckInterned 
to check whether a string is interned.  There are no 
more references to ob_sinterned anywhere outside 
stringobject.c.


----------------------------------------------------------------------

>Comment By: Oren Tirosh (orenti)
Date: 2002-07-06 16:08

Message:
Logged In: YES 
user_id=562624

Oops, forgot to actually attach the patch. Here it is.


----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-07-06 14:35

Message:
Logged In: YES 
user_id=562624

This implementation supports both mortal and immortal interned 
strings.

PyString_InternInPlace creates an immortal interned string for 
backward compatibility with code that relies on this behavior.

PyString_Intern creates a mortal interned string that is 
deallocated when its refcnt reaches 0.  Note that if the string 
value has been previously interned as immortal this will not 
make it mortal.

Most places in the interpreter were changed to PyString_Intern 
except those that may be required for compatibility.

This version of the patch, like the previous one, disables 
indirect interning. Is there any evidence that it is still an 
important optimization for some packages?

Make sure you rebuild everything after applying this patch 
because it modifies the size of string object headers.


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-07-02 04:21

Message:
Logged In: YES 
user_id=80475

I like the way you consolidated all of the knowledge about 
interning into one place.

Consider adding an example to the docs of an effective use 
of interning for optimization.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576101&group_id=5470


From noreply@sourceforge.net  Sun Jul  7 07:21:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 06 Jul 2002 23:21:17 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17R5Q5-0006iX-00@usw-sf-web2.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Jack Jansen (jackjansen)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Sun Jul  7 07:41:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 06 Jul 2002 23:41:14 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17R5jO-00006A-00@usw-sf-web5.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Jack Jansen (jackjansen)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 16:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Sun Jul  7 17:58:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 07 Jul 2002 09:58:42 -0700
Subject: [Patches] [ python-Patches-527518 ] urllib2.py: fix behavior with proxies
Message-ID: <E17RFMw-000386-00@usw-sf-web5.sourceforge.net>

Patches item #527518, was opened at 2002-03-08 19:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Chris Lawrence (lordsutch)
Assigned to: Moshe Zadka (moshez)
Summary: urllib2.py: fix behavior with proxies

Initial Comment:
The following patch against Python 2.1 fixes some
problems with the
urllib2 module when used with proxies; in particular, if
$http_proxy="http://user:passwd@host:port/" is used. 
It also
generates the correct Host header for proxy requests
(some proxies,
such as oops, get confused otherwise, despite RFC 2616
section 5.2
which says they are to ignore it in the case of a full
URL on the
request line).


----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-07 16:58

Message:
Logged In: YES 
user_id=31392

fixed in rev. 1.32 of urllib2.py


----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-07-06 15:08

Message:
Logged In: YES 
user_id=6757

Moshe: The updated patch seems to be A-OK and fixes the
issue in urllib2.py.  At some point I'll have to get back to
urllib.py.

Chris

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2002-06-18 07:40

Message:
Logged In: YES 
user_id=11645

I've looked at the patch, and it mixes cleanup with fixes. I
removed the cleanups parts, since I want
an "obviously correct" patch. Attached is a new patch I
generated which fixes the two problems:

* incorrect quoting of the user/password in the proxy code

* bad host headers when using proxies.

I am also curious about the logic in the later fix. Can
"sel_host" ever be empty? When? Or can we
just remove the "or host" stuff?

Thanks.

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2002-06-13 18:04

Message:
Logged In: YES 
user_id=11645

Nope, no reason, except I need to properly test it
and check it in, and I won't have time for that until
the weekend.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-13 17:51

Message:
Logged In: YES 
user_id=31392

This patch vs. CVS HEAD looks good to me.

Note that it would be better to get the Host header by
upgrading urllib2 to use HTTPConnection instead of HTTP, but
that's a much bigger project.  Would it be a problem to
always send HTTP/1.1 requests -- even to 1.0 servers?

Any reason not to check it in Moshe?


----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-13 14:24

Message:
Logged In: YES 
user_id=6757

I'll try to make these changes sometime over the next few
days; of course, if someone else wants to do it sooner &
check it in, they're more than welcome.

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-06-13 09:45

Message:
Logged In: YES 
user_id=9205

I testet the urllib.py patches for 2.1 and 2.2, they work.
Some minor quibbles are left:
a) the user and/or password may be empty, so your test "if
proxypass and proxyuser" is not enough. You should test
against "is None".
b) in the urllib2 patches, you use unquote() for user and
pass, but in the urllib patches you dont. You should use
unquote in both modules.
c) in urllib2 patch, you use encodestring() without strip()

Here is an example that catches the corner cases
# http://@host.com (empty user and password)
# http://:@host.com (empty user and password)
# http://user@host.com (empty password)
# http://user:@host.com (empty password)
# http://:pass@host.com (empty user)
proxyuserpass, host = splithost(host)
if proxyuserpass is not None:
....# unquote
....proxyuserpass = unquote(proxyuserpass)
....# add empty password if missing
....if ":" not in proxyuserpass: proxyuserpass += ":"
....# base64 
....proxyuserpass = base64.encodestring(proxyuserpass).strip()
....req.add_header("Proxy-Authorization", "Basic
"+proxyuserpass)


Greetings, Bastian

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-13 03:17

Message:
Logged In: YES 
user_id=6757

Ok, here's the patch for urllib.py; again, one patch for
each of 2.1, 2.2 and CVS HEAD.  I also moved the Host header
to right after the GET/PUT request line; this should help
servers that have multiple virtual hosts handle requests
more efficiently.

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-13 02:39

Message:
Logged In: YES 
user_id=6757

Ok, I've cleaned up the patch a bit.  I've got versions for
2.1, 2.2 and current CVS HEAD; they're all the same
substantively, but the 2.2 -> 2.3 jump changed things enough
that the 2.2 patch won't apply cleanly to CVS.

Note that the first big chunk fixes the proxy authentication
problem, while the second chunk fixes the incorrect Host
header problem.  The changes to the import at the beginning
are necessary for either part to work.

I'll investigate urllib.py further.  It looks like the
underlying problem is fixed in CVS HEAD already, but I'll
try to confirm after setting up some test code for urllib.

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-06-13 00:54

Message:
Logged In: YES 
user_id=6757

Moshe, Calvin:

I'll see about reworking the patch against current CVS and
using splituser etc.  I can break it up into two bits if you
like, too; probably cleaner that way.  (Have I mentioned how
much I hate fooling with SF.net's BTS... give me debbugs any
day :-)

Chris

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-06-12 16:41

Message:
Logged In: YES 
user_id=9205

Note that the proxy thing is also a bug in urllib.py.
Chris, can you supply a patch for urllib.py too?

And I dont like the attached patch because it does not use the
splituser and splitpasswd functions already in urllib. I
would suggest
that you use something like
proxyuser, host = splituser(host)
if proxyuser is not None:
....proxypass, proxyuser = splitpasswd(proxyuser)
....[base64 encode and add header]

Chris, if you are too busy, close this patch and I will open
a new bug with a revised patch.

So long, Bastian

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2002-06-11 10:34

Message:
Logged In: YES 
user_id=11645

I want to take a look at this....I'm not thrilled about the
patch, especially solving two unrelated
problems and all, but I do think there's a real problem, and
I'll try to fix it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527518&group_id=5470


From noreply@sourceforge.net  Sun Jul  7 22:24:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 07 Jul 2002 14:24:54 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17RJWY-0008AX-00@usw-sf-web5.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 08:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Jack Jansen (jackjansen)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-07 23:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 08:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 01:50:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 07 Jul 2002 17:50:41 -0700
Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation
Message-ID: <E17RMjh-0007Cs-00@usw-sf-web2.sourceforge.net>

Patches item #578494, was opened at 2002-07-08 00:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Vinay Sajip (vsajip)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 282 Implementation

Initial Comment:
The attached file implements PEP282. The file logging-
0.4.6.tar.gz is the entire distribution including 
setup/install, test/example scripts, and TeX 
documentation. The file logging.py (within the .tar.gz) is 
all that is needed to implement the PEP.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 01:56:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 07 Jul 2002 17:56:03 -0700
Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation
Message-ID: <E17RMot-0007Gv-00@usw-sf-web2.sourceforge.net>

Patches item #578494, was opened at 2002-07-08 00:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Vinay Sajip (vsajip)
>Assigned to: Mark Hammond (mhammond)
Summary: PEP 282 Implementation

Initial Comment:
The attached file implements PEP282. The file logging-
0.4.6.tar.gz is the entire distribution including 
setup/install, test/example scripts, and TeX 
documentation. The file logging.py (within the .tar.gz) is 
all that is needed to implement the PEP.

----------------------------------------------------------------------

>Comment By: Vinay Sajip (vsajip)
Date: 2002-07-08 00:56

Message:
Logged In: YES 
user_id=308438

Added just the logging.py file to make it easier to review.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 07:38:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 07 Jul 2002 23:38:55 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17RSAh-000278-00@usw-sf-web3.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 02:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Jack Jansen (jackjansen)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-08 02:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-07 17:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 02:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 11:09:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 03:09:50 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17RVSo-0000tb-00@usw-sf-web1.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 08:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
>Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 12:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 08:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-07 23:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 08:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 14:40:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 06:40:59 -0700
Subject: [Patches] [ python-Patches-578667 ] Put IDE scripts in ~/Library
Message-ID: <E17RYl9-0004wl-00@usw-sf-web1.sourceforge.net>

Patches item #578667, was opened at 2002-07-08 15:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470

Category: Macintosh
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jack Jansen (jackjansen)
Assigned to: Just van Rossum (jvr)
Summary: Put IDE scripts in ~/Library

Initial Comment:
Just,
here's a patch that was part of a larger set and this one was unrelated to the rest(unfortunately I've forgotten who sent it). The patch moves the IDE scripts folder to ~/Library when running on OSX.

This is a good idea, because it allows people to have their own private set of IDE scripts, even if a sysadmin has installed Python. But: the patch as-is is probably not good enough, as there is no place for system-wide scripts anymore. (Scripts will also be shared between MacPython IDE and MachoPython IDE, which is also nice)

You may want to look at providing two scripts folders, one in the normal location (i.e. somewhere in the Python tree) and one in ~/Library.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 14:52:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 06:52:08 -0700
Subject: [Patches] [ python-Patches-560311 ] os.uname() on Darwin space in machine
Message-ID: <E17RYvw-00059x-00@usw-sf-web1.sourceforge.net>

Patches item #560311, was opened at 2002-05-24 22:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560311&group_id=5470

Category: Distutils and setup.py
Group: Python 2.2.x
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Tim Carlson (timcarlson)
>Assigned to: Jack Jansen (jackjansen)
Summary: os.uname() on Darwin space in machine

Initial Comment:
os.uname() on Darwin (Mac OS X) returns a string for
"machine" of 
"Power MacIntosh" which can cause problems. Getting rid
of the space might be a good thing


----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 15:52

Message:
Logged In: YES 
user_id=45365

os.uname() is simply a wrapper around the C library function of te same name. It returns "Power Macintosh" as the machine type.

For reasons I don't understand the C interface doesn't allow you to get at the "generic processor type" that is returned by "uname -p". This would probably be more useful (as the value is "powerpc"). But then, Linux gets it wrong, and returns "i686" for machine name and "unknown" for processor type, exactly the wrong way around.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-27 15:31

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560311&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 14:57:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 06:57:04 -0700
Subject: [Patches] [ python-Patches-552161 ] Py_AddPendingCall doesn't unlock on fail
Message-ID: <E17RZ0i-0000Po-00@usw-sf-web5.sourceforge.net>

Patches item #552161, was opened at 2002-05-04 05:18
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552161&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Daniel Dunbar (danieldunbar)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: Py_AddPendingCall doesn't unlock on fail

Initial Comment:
ceval.c:Py_AddPendingCall doesn't unlock if it
fails because the queue is full.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 15:57

Message:
Logged In: YES 
user_id=45365

I came across this one when browsing through the patches, it seems to have caught noones attention yet. Assigning it to Guido as he wrote the addpendig stuff (the patch looks benign to me).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552161&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 15:25:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 07:25:19 -0700
Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme
Message-ID: <E17RZS3-00036O-00@usw-sf-web3.sourceforge.net>

Patches item #578688, was opened at 2002-07-08 18:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Stepan Koltsov (yozh)
Assigned to: Nobody/Anonymous (nobody)
Summary: incompatible, but nice strings improveme

Initial Comment:
This patch changes interpretation of multiline strings
(desn't matter, single, double quoted (when NL escaped
with backslash), triple quoted).

After applying this patch, first: first charachter
after opening quote is ignored, if it is NL, example:

"""
la-la-la
"""

will be equivalent of

"""la-la-la
"""

First variant looks better, isn't is?

Second: all spaces after NL before first nonblack char
but no more then current indentation are ignored, example:

New:

def f():
    """
    This is docstring,
    mama-mama,
  apple, banana
     """

is equivalent of old:

def f():
    """This is docstring,
mama-mama,
apple, banana
"""

Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I
suggest you to apply patch but undefine
PyPARSE_STRIPPED_STRINGS until python-4 ;-)

I am sure, that this semantics is right, as
alternative, I suggest adding new modifier 'i' to
strings, like 'u' and 'r', for inst. i'iddqd'.

P. S. AFAIU, editing of parsermodule.c needed.

P. P. S. I am sorry, my English suck :-(


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 15:47:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 07:47:46 -0700
Subject: [Patches] [ python-Patches-578667 ] Put IDE scripts in ~/Library
Message-ID: <E17RZnm-0003YM-00@usw-sf-web3.sourceforge.net>

Patches item #578667, was opened at 2002-07-08 15:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470

Category: Macintosh
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jack Jansen (jackjansen)
Assigned to: Just van Rossum (jvr)
Summary: Put IDE scripts in ~/Library

Initial Comment:
Just,
here's a patch that was part of a larger set and this one was unrelated to the rest(unfortunately I've forgotten who sent it). The patch moves the IDE scripts folder to ~/Library when running on OSX.

This is a good idea, because it allows people to have their own private set of IDE scripts, even if a sysadmin has installed Python. But: the patch as-is is probably not good enough, as there is no place for system-wide scripts anymore. (Scripts will also be shared between MacPython IDE and MachoPython IDE, which is also nice)

You may want to look at providing two scripts folders, one in the normal location (i.e. somewhere in the Python tree) and one in ~/Library.

----------------------------------------------------------------------

>Comment By: Just van Rossum (jvr)
Date: 2002-07-08 16:47

Message:
Logged In: YES 
user_id=92689

It was Tony Lownds. I'm all for the intentions of the patch, but I see it will 
fail on MacPython, which doesn't support os.environ["HOME"]. But I 
guess that statement could simply be replaced by the appropriate 
FindFolder() call.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578667&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 16:48:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 08:48:00 -0700
Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme
Message-ID: <E17Rak4-0002Oe-00@usw-sf-web5.sourceforge.net>

Patches item #578688, was opened at 2002-07-08 16:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Stepan Koltsov (yozh)
Assigned to: Nobody/Anonymous (nobody)
Summary: incompatible, but nice strings improveme

Initial Comment:
This patch changes interpretation of multiline strings
(desn't matter, single, double quoted (when NL escaped
with backslash), triple quoted).

After applying this patch, first: first charachter
after opening quote is ignored, if it is NL, example:

"""
la-la-la
"""

will be equivalent of

"""la-la-la
"""

First variant looks better, isn't is?

Second: all spaces after NL before first nonblack char
but no more then current indentation are ignored, example:

New:

def f():
    """
    This is docstring,
    mama-mama,
  apple, banana
     """

is equivalent of old:

def f():
    """This is docstring,
mama-mama,
apple, banana
"""

Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I
suggest you to apply patch but undefine
PyPARSE_STRIPPED_STRINGS until python-4 ;-)

I am sure, that this semantics is right, as
alternative, I suggest adding new modifier 'i' to
strings, like 'u' and 'r', for inst. i'iddqd'.

P. S. AFAIU, editing of parsermodule.c needed.

P. P. S. I am sorry, my English suck :-(


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-08 17:48

Message:
Logged In: YES 
user_id=21627

The first part of your patch is not needed, you can just as
fine write

"""\
la-la-la
"""

to escape the first newline.

The second patch is probably not needed either, since you
can easily write library routines that deal with that kind
of stripping. In fact, pydoc already does that transformation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470


From noreply@sourceforge.net  Mon Jul  8 17:06:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 09:06:39 -0700
Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme
Message-ID: <E17Rb27-0007b0-00@usw-sf-web1.sourceforge.net>

Patches item #578688, was opened at 2002-07-08 18:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Stepan Koltsov (yozh)
Assigned to: Nobody/Anonymous (nobody)
Summary: incompatible, but nice strings improveme

Initial Comment:
This patch changes interpretation of multiline strings
(desn't matter, single, double quoted (when NL escaped
with backslash), triple quoted).

After applying this patch, first: first charachter
after opening quote is ignored, if it is NL, example:

"""
la-la-la
"""

will be equivalent of

"""la-la-la
"""

First variant looks better, isn't is?

Second: all spaces after NL before first nonblack char
but no more then current indentation are ignored, example:

New:

def f():
    """
    This is docstring,
    mama-mama,
  apple, banana
     """

is equivalent of old:

def f():
    """This is docstring,
mama-mama,
apple, banana
"""

Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I
suggest you to apply patch but undefine
PyPARSE_STRIPPED_STRINGS until python-4 ;-)

I am sure, that this semantics is right, as
alternative, I suggest adding new modifier 'i' to
strings, like 'u' and 'r', for inst. i'iddqd'.

P. S. AFAIU, editing of parsermodule.c needed.

P. P. S. I am sorry, my English suck :-(


----------------------------------------------------------------------

>Comment By: Stepan Koltsov (yozh)
Date: 2002-07-08 20:06

Message:
Logged In: YES 
user_id=247706

I think the first part is still needed since
1. In r"""\
lalala
""" backslash doesn't escape NL
2. I think it looks better.

About second part:
1. Additional library routines make program text less readable.
2. They cannot know what indentation in spaces was where
string constant appeared.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-08 19:48

Message:
Logged In: YES 
user_id=21627

The first part of your patch is not needed, you can just as
fine write

"""\
la-la-la
"""

to escape the first newline.

The second patch is probably not needed either, since you
can easily write library routines that deal with that kind
of stripping. In fact, pydoc already does that transformation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470


From noreply@sourceforge.net  Tue Jul  9 02:45:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 18:45:28 -0700
Subject: [Patches] [ python-Patches-565183 ] email Parser non-strict mode
Message-ID: <E17Rk4G-0005yw-00@usw-sf-web3.sourceforge.net>

Patches item #565183, was opened at 2002-06-06 03:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=565183&group_id=5470

Category: Modules
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email Parser non-strict mode

Initial Comment:
Here's my current state of the non-strict Parser mode.
At the moment it handles most ugly stuff I see, with the
exception of the multiple-nested-multiparts-with the same
boundary tags grossness - but I think that this is actually
a pretty savage violation of the RFC, so I'm not too fussed
about it.

There's still some work to be done in the area of digests,
but I'll bring that up on mimelib-devel.

----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-08 21:45

Message:
Logged In: YES 
user_id=12800

I've got this integrated with my copy now and will likely
check it in.  Any possibility you can send me some unit tests?


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-06-06 04:09

Message:
Logged In: YES 
user_id=29957

Here's a newer version of the patch that gets digests right,
as I talked about on mimelib-devel. The code that gets digests
right should be split out of this in any case - I'd look
into splitting
it, but I've got too much on my plate right now. 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=565183&group_id=5470


From noreply@sourceforge.net  Tue Jul  9 03:06:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 08 Jul 2002 19:06:59 -0700
Subject: [Patches] [ python-Patches-490456 ] Unicode support in email.Utils.encode
Message-ID: <E17RkP5-00060E-00@usw-sf-web2.sourceforge.net>

Patches item #490456, was opened at 2001-12-07 18:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=490456&group_id=5470

Category: Library (Lib)
Group: None
>Status: Pending
Resolution: None
Priority: 5
Submitted By: Mikhail Zabaluev (mzabaluev)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Unicode support in email.Utils.encode

Initial Comment:
It's essentially an updated patch 486375, this time
making a distinction of type for the passed string; if
it's Unicode, the function encodes it to the character
set specified as the charset parameter.
The reasons:
1. The function in its current version doesn't support
Unicode, throwing an exception if any non-ASCII
characters are found within it.
2. With this patch, we reach a sort of operational
symmetry on email.Utils.encode vs email.Utils.decode,
as it can be seen in the tests.

----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-08 22:06

Message:
Logged In: YES 
user_id=12800

I'm changing the status to Pending since I think this patch
is no longer relevant given that email.Utils.encode() is
deprecated.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-28 23:37

Message:
Logged In: YES 
user_id=12800

Sigh, sorry for taking so long to get to this.

email.Utils.encode() is deprecated now, and I'd actually
like to remove it rather than patch it. ;) 

Shouldn't the Header class be used instead?


----------------------------------------------------------------------

Comment By: Mikhail Zabaluev (mzabaluev)
Date: 2001-12-11 17:52

Message:
Logged In: YES 
user_id=313104

2loewis:
In a typical email application, it'd be better to display
partially encoded text than to face a hard stop when trying
to process a message, hence 'replace'. Actually, the
encoding mode could be an optional parameter, but I don't
feel like deciding on parameters for a function not
developed by me. Barry?
The isinstance part seems to be valid, I'm updating the
patch here accordingly.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-11 13:28

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that I cannot really see the
value in using "replace" for .encode. Wouldn't it be better
to get an exception if the Unicode string contains an
un-encodable character?

Also, the Python 2.2 way to spell the type test is

  if isinstance(s, unicode)

This makes use of the fact that the unicode builtin is a
type now, and it supports unicode subtypes. This is a minor
change, of course.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=490456&group_id=5470


From noreply@sourceforge.net  Tue Jul  9 09:13:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 01:13:26 -0700
Subject: [Patches] [ python-Patches-578688 ] incompatible, but nice strings improveme
Message-ID: <E17Rq7i-0003i1-00@usw-sf-web5.sourceforge.net>

Patches item #578688, was opened at 2002-07-08 16:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Stepan Koltsov (yozh)
Assigned to: Nobody/Anonymous (nobody)
Summary: incompatible, but nice strings improveme

Initial Comment:
This patch changes interpretation of multiline strings
(desn't matter, single, double quoted (when NL escaped
with backslash), triple quoted).

After applying this patch, first: first charachter
after opening quote is ignored, if it is NL, example:

"""
la-la-la
"""

will be equivalent of

"""la-la-la
"""

First variant looks better, isn't is?

Second: all spaces after NL before first nonblack char
but no more then current indentation are ignored, example:

New:

def f():
    """
    This is docstring,
    mama-mama,
  apple, banana
     """

is equivalent of old:

def f():
    """This is docstring,
mama-mama,
apple, banana
"""

Patch enabled if PyPARSE_STRIPPED_STRINGS defined. I
suggest you to apply patch but undefine
PyPARSE_STRIPPED_STRINGS until python-4 ;-)

I am sure, that this semantics is right, as
alternative, I suggest adding new modifier 'i' to
strings, like 'u' and 'r', for inst. i'iddqd'.

P. S. AFAIU, editing of parsermodule.c needed.

P. P. S. I am sorry, my English suck :-(


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-09 10:13

Message:
Logged In: YES 
user_id=21627

In that case,I think your proposed change will be highly
debated. That means you will have to write a PEP first if
you want to see it implemented (even if it is only an option).

----------------------------------------------------------------------

Comment By: Stepan Koltsov (yozh)
Date: 2002-07-08 18:06

Message:
Logged In: YES 
user_id=247706

I think the first part is still needed since
1. In r"""\
lalala
""" backslash doesn't escape NL
2. I think it looks better.

About second part:
1. Additional library routines make program text less readable.
2. They cannot know what indentation in spaces was where
string constant appeared.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-08 17:48

Message:
Logged In: YES 
user_id=21627

The first part of your patch is not needed, you can just as
fine write

"""\
la-la-la
"""

to escape the first newline.

The second patch is probably not needed either, since you
can easily write library routines that deal with that kind
of stripping. In fact, pydoc already does that transformation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578688&group_id=5470


From noreply@sourceforge.net  Tue Jul  9 23:43:53 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 15:43:53 -0700
Subject: [Patches] [ python-Patches-560379 ] Karatsuba multiplication
Message-ID: <E17S3i5-0002AD-00@usw-sf-web4.sourceforge.net>

Patches item #560379, was opened at 2002-05-24 21:07
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560379&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Christopher A. Craig (ccraig)
Assigned to: Tim Peters (tim_one)
Summary: Karatsuba multiplication

Initial Comment:
Adds Karatsuba multiplication to Python.  
Patches longobject.c to use Karatsuba multiplication in
place
of gradeschool math.


----------------------------------------------------------------------

>Comment By: Christopher A. Craig (ccraig)
Date: 2002-07-09 18:43

Message:
Logged In: YES 
user_id=135050

I've brought the code into compliance with the coding
standards in the PEP7, and added some comments that I
thought were in line with the rest of the file.  If there is
something else you would like me to do, please tell me. 


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-05 17:38

Message:
Logged In: YES 
user_id=6380

Tim thinks this is cool, but the code can use cleanup and 
comments.

Also, let's not add platform specific hacks (Christian can sell 
those as an add-on :-).

----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-05-25 19:41

Message:
Logged In: YES 
user_id=135050

I made the needed changes to make to split on the bigger
number (basically chaged to split on bigger number, and
changed all of the places that need to check to see if there
are no bits left), and the new one is a little bit faster,
so I'm uploading it too.

I had been thinking about fixed precision numbers when I
wrote it, so I honestly didn't consider the fact that I
could just shift the smaller number to 0 and throw it
away... :-)


----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-05-25 12:16

Message:
Logged In: YES 
user_id=135050

I just uploaded a graph with some sample timings in it.  
Red is a fence of 20. Green is a fence of 40. Blue is a
fence of 60. Black is done with unmodified Python 2.2.1.  


----------------------------------------------------------------------

Comment By: Christopher A. Craig (ccraig)
Date: 2002-05-25 01:53

Message:
Logged In: YES 
user_id=135050

I got 40 from testing.  Basically I generated 250 random
numbers each for a series of sizes between 5 and 2990 bits
long at 15 bit intervals (i.e. the word size), and stored it
in a dictionary.  Then timed 249 multiplies at each size for
a bunch of fence values and used gdchart to make a pretty
graph.   It cerntainly could be optimized better per
compiler/platform, but I don't know how much gain you'ld see.

I split on the smaller number because I guessed it would be
better.  My thought was that if I split on the smaller
number I'm guaranteed to reach the fence, at which point I
can use the gradeschool method at a near linear cost (since
it's O(n*m) and one of those two is at most the fence size).
 If I split on the larger number, I may run into a condition
where the smaller number is less than half the larger, but I
haven't reached the fence yet, and then gradeschool could be
much more expensive.


----------------------------------------------------------------------

Comment By: Christian Tismer (tismer)
Date: 2002-05-24 23:23

Message:
Logged In: YES 
user_id=105700

Hmm, not bad.

Q: You set the split fence at 40. Where does this number
come from? I think this could be optimzed per compiler/platform.

You say that you split based on the smaller number.
Why this? My intuitive guess would certainly be to always split
on the larger number. I just checked my Python implementation
which does this.
Open question: how to handle very small by very long the
best way? Probably the highschool version is better here,
and that might have led you to investigate the smaller one.
I'd say bosh should be checked.

good work! - cheers chris


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=560379&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 00:44:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 16:44:16 -0700
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E17S4eW-0002Tq-00@usw-sf-web2.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 12:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 18:44

Message:
Logged In: YES 
user_id=44345

Closing since there seems to be no votes in favor, at least not by bots...

S


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 20:25

Message:
Logged In: YES 
user_id=31435

hasattr() is defined in terms of whether PyObject_GetAttr() 
raises an exception, and thanks to __getattr__ hooks can't 
be computed any faster than calling PyObject_GetAttr().  
Which is what the code does:

	v = PyObject_GetAttr(v, name);
	if (v == NULL) {
		PyErr_Clear();
		Py_INCREF(Py_False);
		return Py_False;
	}
	Py_DECREF(v);
	Py_INCREF(Py_True);
	return Py_True;

It's simply not going to get faster than that.

I'm not saying you can't have a "better" message here 
(although since an object's __name__ field doesn't bear any 
necessary relationship to the variable name(s) through 
which the object is referenced, it's unclear that the 
message won't actually be worse in real non-trivial cases:  
the type name is an object invariant, but the name can be 
misleading).  I am saying the tradeoff is real and needs to 
be addressed.  That's part of "good design", Dale; doing 
what feels good in the last case you remember is arguably 
not.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 19:50

Message:
Logged In: YES 
user_id=44345

In theory.  Python's getattr capability is so dynamic
though I suspect there's little hasattr() can
do but call getattr() and react to the result.


----------------------------------------------------------------------

Comment By: Dale Strickland-Clark (dalesc)
Date: 2002-03-20 18:36

Message:
Logged In: YES 
user_id=457577

Surely Tim's is more an argument for fixing hasattr so it 
doesn't depend on an exception?
To limit meaningful error messages because they slow normal 
program flow screams 'bad design' to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 17:09

Message:
Logged In: YES 
user_id=31435

If it's one cycle slower than it is today when the 
exception is ignored, Zope will notice it (it uses hasattr 
for blood).  Then Guido will get fired, have to pump gas in 
Amsterdam for a living, and we'll never hear from him 
again.  How badly do you want to destroy Python <wink>?

It may be fruitful to hammer out an efficient alternative 
on PythonDev.

It's not an argument about whether more info would be 
useful, although <wink> on c.l.py Dale seemed happy enough 
as soon as someone explained what 'module' was doing in his 
msg.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 15:50

Message:
Logged In: YES 
user_id=44345

hmmm...  How much would I have to modify it to get you
to change your mind?  I'm pretty sure I can get rid of
the call to PyObject_HasAttrString without a lot of
effort.  I can't do much about avoiding at least one
PyObject_GetAttrString call though, which obviously
means you could wind up back in bytecode.

I jumped on this after seeing the request in c.l.py
mostly because I've wanted it from time-to-time as
well.  The extra information is useful at times.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 00:45:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 16:45:15 -0700
Subject: [Patches] [ python-Patches-506436 ] GETCONST/GETNAME/GETNAMEV speedup
Message-ID: <E17S4fT-0002VV-00@usw-sf-web2.sourceforge.net>

Patches item #506436, was opened at 2002-01-21 07:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Tim Peters (tim_one)
Summary: GETCONST/GETNAME/GETNAMEV speedup

Initial Comment:
The attached patch redefines the GETCONST, GETNAME &
GETNAMEV 
macros to do the following:

  * access the code object's consts 
and names through
    local variables instead of the long chain from 
f

  * use access macros to index the tuples and get
    the C string 
names

The code appears correct, and I've had no trouble
with 
it.  It only provides the most trivial of
improvement on pystone 
(around 1% when I see
anything), but it's all those little things 
that
add up, right?

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 18:45

Message:
Logged In: YES 
user_id=44345

Looking for a vote up or down on this one...


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-01-21 07:47

Message:
Logged In: YES 
user_id=44345

Whoops...  Make the "observed" speedup 0.1%...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 00:46:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 16:46:25 -0700
Subject: [Patches] [ python-Patches-534862 ] help asyncore recover from repr() probs
Message-ID: <E17S4gb-0002Wf-00@usw-sf-web2.sourceforge.net>

Patches item #534862, was opened at 2002-03-25 15:12
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Jeremy Hylton (jhylton)
Summary: help asyncore recover from repr() probs

Initial Comment:
I've had this patch my my copy of asyncore.py
for quite awhile.  It works for me as a way to
recover from repr() bogosities, though I'm
unfamiliar enough with repr/str issues and
asyncore to know if this is the right way to
make it more bulletproof (or if it should even be
made more bulletproof).

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 18:46

Message:
Logged In: YES 
user_id=44345

Looking for a vote up or down so I can get rid of the "M" when I execute
"cvs up"...

S


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-04-04 11:12

Message:
Logged In: YES 
user_id=6380

Jeremy, what do you think of this? Looks harmless to me...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 00:48:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 16:48:01 -0700
Subject: [Patches] [ python-Patches-569574 ] plain text enhancement for cgitb
Message-ID: <E17S4i9-0002YS-00@usw-sf-web2.sourceforge.net>

Patches item #569574, was opened at 2002-06-15 23:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
>Assigned to: Ka-Ping Yee (ping)
Summary: plain text enhancement for cgitb

Initial Comment:
Here's a patch to cgitb that allows you to enable plain
text output.  It adds an extra variable to the cgitb.enable
function and corresponding underlying functions.  To get
plain text invoke it as

    import cgitb
    cgitb.enable(format="text")

(actually, any value for format other than "html" will 
enable plain text output).  The default value is "html", so 
existing usage of cgitb should be unaffected.

I realize this isn't quite what you suggested, but it 
seemed to me worthwhile to keep such similar code 
together.

I'm not entirely certain I haven't fouled up the html
formatting.  It needs to be checked still.  Also still to come
is a doc change.

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 18:48

Message:
Logged In: YES 
user_id=44345

Ping

How about you?  As the author I think you're in the best position to
decide on the merits of the patch...

Skip


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-19 22:36

Message:
Logged In: YES 
user_id=6380

Unassigning -- I won't get to this before my vacation.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-16 00:09

Message:
Logged In: YES 
user_id=44345

Okay, here's a correction to the first patch.  It fixes the logic
bug that corrupted the HTML output.  It also adds a little bit
of extra documentation.

Writing the documentation made me think that perhaps this
should be added to the traceback module as Guido
suggested with just a stub cgitb module that provides an 
enable function that calls the enable function in the 
traceback module with format="html".  The cgitb module
could then be deprecated.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 03:22:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 19:22:51 -0700
Subject: [Patches] [ python-Patches-578494 ] PEP 282 Implementation
Message-ID: <E17S77z-0004NS-00@usw-sf-web2.sourceforge.net>

Patches item #578494, was opened at 2002-07-08 10:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Pending
Resolution: None
Priority: 5
Submitted By: Vinay Sajip (vsajip)
Assigned to: Mark Hammond (mhammond)
Summary: PEP 282 Implementation

Initial Comment:
The attached file implements PEP282. The file logging-
0.4.6.tar.gz is the entire distribution including 
setup/install, test/example scripts, and TeX 
documentation. The file logging.py (within the .tar.gz) is 
all that is needed to implement the PEP.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-10 12:22

Message:
Logged In: YES 
user_id=14198

The code seems high quality and well documented.  I have no
concerns with logging.py as such.

I have two main issues:
* Design decisions:  looking over python-dev, I can not see
a consensus on the design decisions.  I believe that *some*
type of official acceptance of the design should be decreed
by someone.

* Source structure: while this seems quite suitable for an
extension module, the format of the patch is probably not
quite correct for a core module.  For example, the test code
should probably be integrated with the standard Python test
suite (even if in a sub-directory), the Tex docs integrated
with Python's docs etc

So while I think the patch is high quality I believe these
issues need to be addressed before I can do much more.

Setting to "pending" - but good stuff tho!  Please drive
this through!

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2002-07-08 10:56

Message:
Logged In: YES 
user_id=308438

Added just the logging.py file to make it easier to review.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578494&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 04:10:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 20:10:17 -0700
Subject: [Patches] [ python-Patches-579433 ] Solaris openpty() and forkpty() addition
Message-ID: <E17S7rt-000452-00@usw-sf-web1.sourceforge.net>

Patches item #579433, was opened at 2002-07-09 22:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579433&group_id=5470

Category: Modules
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Lance Ellinghaus (ellinghaus)
Assigned to: Nobody/Anonymous (nobody)
Summary: Solaris openpty() and forkpty() addition

Initial Comment:
This patch provides a Solaris 2.8 version of openpty() 
and forkpty() since they are not provided for in the 
distribution of Solaris. This has only been tested on 
Solaris 2.8.
This was posed to Python-DEV and I was told to post it 
here, so I am.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579433&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 04:13:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 20:13:17 -0700
Subject: [Patches] [ python-Patches-579435 ] Shadow Password Support Module
Message-ID: <E17S7un-00048p-00@usw-sf-web1.sourceforge.net>

Patches item #579435, was opened at 2002-07-09 22:13
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579435&group_id=5470

Category: Modules
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Lance Ellinghaus (ellinghaus)
Assigned to: Nobody/Anonymous (nobody)
Summary: Shadow Password Support Module

Initial Comment:
Attached is the spwd module. This module provides 
support for Shadow Passwords on Solaris 2.8. This 
compliments the nis and pwd modules. This is the only 
way to gain access to the encrypted passwords when 
using shadow passwords on Solaris.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579435&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 04:33:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 09 Jul 2002 20:33:11 -0700
Subject: [Patches] [ python-Patches-569574 ] plain text enhancement for cgitb
Message-ID: <E17S8E3-00045n-00@usw-sf-web3.sourceforge.net>

Patches item #569574, was opened at 2002-06-15 21:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Ka-Ping Yee (ping)
Summary: plain text enhancement for cgitb

Initial Comment:
Here's a patch to cgitb that allows you to enable plain
text output.  It adds an extra variable to the cgitb.enable
function and corresponding underlying functions.  To get
plain text invoke it as

    import cgitb
    cgitb.enable(format="text")

(actually, any value for format other than "html" will 
enable plain text output).  The default value is "html", so 
existing usage of cgitb should be unaffected.

I realize this isn't quite what you suggested, but it 
seemed to me worthwhile to keep such similar code 
together.

I'm not entirely certain I haven't fouled up the html
formatting.  It needs to be checked still.  Also still to come
is a doc change.

Skip


----------------------------------------------------------------------

>Comment By: Ka-Ping Yee (ping)
Date: 2002-07-09 20:33

Message:
Logged In: YES 
user_id=45338

I think enhanced text tracebacks would be great.

(I even have my own hacked-up one lying around
here somewhere -- it colourized the output.  I think
a part of me was waiting for an opportunity to
make enhanced tracebacks standard. The most
important enhancement IMHO is to show argument
values.)

I don't think the functionality belongs in cgitb,
though.  The main routine probably should go
in traceback; the common routines (scanvars
and lookup) can go there too.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 16:48

Message:
Logged In: YES 
user_id=44345

Ping

How about you?  As the author I think you're in the best position to
decide on the merits of the patch...

Skip


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-19 20:36

Message:
Logged In: YES 
user_id=6380

Unassigning -- I won't get to this before my vacation.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-15 22:09

Message:
Logged In: YES 
user_id=44345

Okay, here's a correction to the first patch.  It fixes the logic
bug that corrupted the HTML output.  It also adds a little bit
of extra documentation.

Writing the documentation made me think that perhaps this
should be added to the traceback module as Guido
suggested with just a stub cgitb module that provides an 
enable function that calls the enable function in the 
traceback module with format="html".  The cgitb module
could then be deprecated.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=569574&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 21:51:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 10 Jul 2002 13:51:14 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17SOQc-0006Vg-00@usw-sf-web2.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Skip Montanaro (montanaro)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Wed Jul 10 22:09:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 10 Jul 2002 14:09:39 -0700
Subject: [Patches] [ python-Patches-579841 ] Build MachoPython with 2level namespace
Message-ID: <E17SOiR-0006WK-00@usw-sf-web3.sourceforge.net>

Patches item #579841, was opened at 2002-07-10 23:09
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579841&group_id=5470

Category: Macintosh
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jack Jansen (jackjansen)
Assigned to: Jack Jansen (jackjansen)
Summary: Build MachoPython with 2level namespace

Initial Comment:
This patch builds a framework-based Python on OSX without --flat_namespace. In addition the Makefile.pre.in logic for building the temporary framework is slightly reordered to make it more error-proof.

The main reason for putting this patch up here is that it was supposed to disallow importing extension modules for a framework-python to be imported into a non-framework-python. But unfortunately it does this this with a coredump in stead of with the expected "Python not initialized (wrong version?)" error message. I would like feedback as to why this is (as other people do get the error message in similar situations).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579841&group_id=5470


From noreply@sourceforge.net  Thu Jul 11 22:45:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 11 Jul 2002 14:45:57 -0700
Subject: [Patches] [ python-Patches-580331 ] xreadlines caching, file iterator
Message-ID: <E17Sll7-0001OI-00@usw-sf-web1.sourceforge.net>

Patches item #580331, was opened at 2002-07-11 21:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: xreadlines caching, file iterator

Initial Comment:
Calling f.xreadlines() multiple times returns the same 
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls 
the cached xreadlines object's next method.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 03:11:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 11 Jul 2002 19:11:20 -0700
Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre
Message-ID: <E17Sptw-0005rz-00@usw-sf-web1.sourceforge.net>

Patches item #580386, was opened at 2002-07-11 22:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Fredrik Lundh (effbot)
Summary: uncaught TypeError exception in sre

Initial Comment:
>From c.l.p on 9 July, Kevin Altis reported that:

    re.compile('([a-')

Produces an uncaught TypeError from compilation.

This patch catches the TypeError in _compile().

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 04:05:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 11 Jul 2002 20:05:41 -0700
Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre
Message-ID: <E17SqkX-0007dE-00@usw-sf-web4.sourceforge.net>

Patches item #580386, was opened at 2002-07-11 22:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Fredrik Lundh (effbot)
Summary: uncaught TypeError exception in sre

Initial Comment:
>From c.l.p on 9 July, Kevin Altis reported that:

    re.compile('([a-')

Produces an uncaught TypeError from compilation.

This patch catches the TypeError in _compile().

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-11 23:05

Message:
Logged In: YES 
user_id=33168

I wonder if the same change must be made in _compile_repl().

I don't see the benefit of the try/except clause as it is:

  try:
    p = parse...
  except error, v:
    raise error, v
Isn't that just:  p = parse...

This probably also should be backported to 2.2

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 05:04:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 11 Jul 2002 21:04:01 -0700
Subject: [Patches] [ python-Patches-580411 ] move frame macros into ceval
Message-ID: <E17Srez-0008KG-00@usw-sf-web4.sourceforge.net>

Patches item #580411, was opened at 2002-07-12 00:04
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: move frame macros into ceval

Initial Comment:
There are some old macros in frameobject.h which are
only used in ceval.c.  These macros are not prefixed
with Py and some aren't used at all.

This patch:
 * removes all of the macros from frameobject.h
 * moves the necessary macros into ceval.c
 * gets rid of an extra level of macros
 * uses co alias instead of f->f_code

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 05:16:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 11 Jul 2002 21:16:56 -0700
Subject: [Patches] [ python-Patches-580411 ] move frame macros into ceval
Message-ID: <E17SrrU-0008Um-00@usw-sf-web4.sourceforge.net>

Patches item #580411, was opened at 2002-07-12 00:04
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: move frame macros into ceval

Initial Comment:
There are some old macros in frameobject.h which are
only used in ceval.c.  These macros are not prefixed
with Py and some aren't used at all.

This patch:
 * removes all of the macros from frameobject.h
 * moves the necessary macros into ceval.c
 * gets rid of an extra level of macros
 * uses co alias instead of f->f_code

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-12 00:16

Message:
Logged In: YES 
user_id=31435

Nice!  Accepted and back to Neal.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 12:07:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 04:07:58 -0700
Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre
Message-ID: <E17SyHG-0007qH-00@usw-sf-web4.sourceforge.net>

Patches item #580386, was opened at 2002-07-12 04:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
>Resolution: Duplicate
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Fredrik Lundh (effbot)
Summary: uncaught TypeError exception in sre

Initial Comment:
>From c.l.p on 9 July, Kevin Altis reported that:

    re.compile('([a-')

Produces an uncaught TypeError from compilation.

This patch catches the TypeError in _compile().

----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-07-12 13:07

Message:
Logged In: YES 
user_id=38376

this is same as bug #545855, and should be fixed inside
the SRE parser (afaik, it has been, in the SLAB master
repository).

as for the extra try/except: this is to shield ordinary users
from 20-level tracebacks exposing irrelevant implementation
details .  if you make a mistake in an RE, you want to know
that, but you probably don't care about exactly where in the
parser or compiler internals the interpreter happens to be
when that mistake was discovered...  (this pattern, along
with the "add a comment on the raise line, to provide extra
hints for a human reader" idiom, are pretty common in
Python libraries). /F

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-12 05:05

Message:
Logged In: YES 
user_id=33168

I wonder if the same change must be made in _compile_repl().

I don't see the benefit of the try/except clause as it is:

  try:
    p = parse...
  except error, v:
    raise error, v
Isn't that just:  p = parse...

This probably also should be backported to 2.2

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 12:11:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 04:11:52 -0700
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E17SyL2-0007ue-00@usw-sf-web4.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 14:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

>Category: Modules
Group: None
Status: Open
Resolution: Accepted
>Priority: 8
Submitted By: Greg Chapman (glchapman)
Assigned to: Fredrik Lundh (effbot)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-07-12 13:11

Message:
Logged In: YES 
user_id=38376

(bumped priority as a reminder to self) /F

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-08 19:28

Message:
Logged In: YES 
user_id=31435

Assigned to /F -- he's the expert here.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 16:23

Message:
Logged In: YES 
user_id=86307

I'm pretty sure the memset is correct; state->lastmark is 
the index of last mark written to (not the index of the 
next potential write).

Also, it occurred to me that there is another related error 
here:

>>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123')
>>> m.groups()
(None, None)
>>> m.lastindex
2

In other words, lastindex claims that group 2 was the last 
that matched, even though it didn't really match.  Since 
lastindex is undocumented, this probably doesn't matter too 
much.  Still, it probably should be reset if it is pointing 
to a group which gets "unmatched" when state->lastmark is 
reduced.  Perhaps a function like the following should be 
added for use in the three places where state->lastmark is 
reset to a previous value:

void lastmark_restore(SRE_STATE *state, int lastmark)
{
    assert(lastmark >= 0);
    if (state->lastmark > lastmark) {
        int lastvalidindex = 
            (lastmark == 0) ? -1 : (lastmark-1)/2+1;
        if (state->lastindex > lastvalidindex)
            state->lastindex = lastvalidindex;
        memset(
            state->mark + lastmark + 1, 0,
            (state->lastmark - lastmark) * sizeof(void*)
        );
    }
    state->lastmark = lastmark;
}
 

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-08 14:29

Message:
Logged In: YES 
user_id=33168

Confirmed that the test w/o fix fails
and the test passes with the fix to _sre.c.

But I'm not sure if the memset can go too far:

  memset(state->mark + lastmark + 1, 0, 
         (state->lastmark - lastmark) * sizeof(void*));

I can try under purify, but that doesn't guarantee anything.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 14:20

Message:
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 16:46:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 08:46:44 -0700
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E17T2d2-0003nn-00@usw-sf-web1.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 16:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
>Assigned to: Jeremy Hylton (jhylton)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 11:46

Message:
Logged In: YES 
user_id=6380

Assigning to Jeremy in the hope that he can provide a review.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-23 23:03

Message:
Logged In: YES 
user_id=205865

The newer patch is generated against the latest CVS tree,
and it provides additional documentation.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-11 14:47

Message:
Logged In: YES 
user_id=205865

Sorry, been caught with a zillion of other things to do.
I'll try to reorganize it somehow and ask for opinions.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-11 14:42

Message:
Logged In: YES 
user_id=31392

misa-- any progress on this patch?


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 18:12

Message:
Logged In: YES 
user_id=6380

OK, thanks; I'll wait!

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-03-01 17:58

Message:
Logged In: YES 
user_id=205865

I will add documentation and show the intended usage.
urllib* doesn't deal with proxying over SSL (using CONNECT
instead of GET/POST). urllib* also use the compatibility
classes, HTTP/HTTPS, instead of HTTPConnection (this is not
an argument by itself).
Thanks for the suggestion.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 17:58:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 09:58:32 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17T3kW-0005fi-00@usw-sf-web3.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 12:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 16:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-27 00:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 21:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-21 00:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 17:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 15:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 16:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 20:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-05 02:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 18:08:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 10:08:02 -0700
Subject: [Patches] [ python-Patches-575515 ] Merge xrange() into slice()
Message-ID: <E17T3ti-0005wH-00@usw-sf-web3.sourceforge.net>

Patches item #575515, was opened at 2002-06-29 18:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575515&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Merge xrange() into slice()

Initial Comment:
Xrange() and Slice() have evolved to be very similar.  
Merging the code for xrange() into slice() will 
complete the transformation, put all the capability 
into one object, eliminate an object type, eliminate 
two source files, and shrink the Python concept 
space by a modest amount.

Discussion on py-dev (see thread Xrange and Slices 
starting on 6/26/2002) was generally favorable. All 
of the design suggestions received have been 
incorporated in this patch.

Slice is left intact as a mutable container of arbitrary 
Python objects.  It's sq_item, sq_len, and tp_iter slots 
are filled in to give it the same sequence behavior as 
xrange().  The tp_iter slot creates an immutable 
iterator based on the state of the slice at the time the 
iterator is created.  The iterator uses c longs instead 
of PyObjects to protect its immutability and to keep 
the super fast speed that it had in xrange().

To keep the old xrange iterface intact, 'xrange' is 
made synonymous with 'slice'.  Also, slice.h is given 
macros and a PyRange_New() wrapper so that the 
xrange C API is left intact.

Two minor open issues:
1. Should repr() say 'slice' or 'xrange'?
2. What should the return value be for slice_length() 
when step is zero or None?

Patch passes all regression tests.  A news item 
should be added eventhough the APIs are unchanged.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 13:08

Message:
Logged In: YES 
user_id=6380

Rejecting. It's better to let these two be different, so
that it's clear what the intended use is.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-06-30 15:35

Message:
Logged In: YES 
user_id=80475

New patch attached. Incorporates three ideas from Oren 
Tirosh's code review (int-->long, xrange as public interface, 
return -1 on len error).

I'm away from the computer for the next five weeks. Oren 
has agreed to champion my patches (not necessarily 
advocate, just make sure they get a fair trial and that 
requested changes get made).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575515&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 18:21:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 10:21:17 -0700
Subject: [Patches] [ python-Patches-580670 ] less restrictive HTML comments
Message-ID: <E17T46X-0006Eh-00@usw-sf-web3.sourceforge.net>

Patches item #580670, was opened at 2002-07-12 13:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Bill Bell (wbell539)
Assigned to: Nobody/Anonymous (nobody)
Summary: less restrictive HTML comments

Initial Comment:

Current code enforces requirement that HTML comments open 
with '<!--'. Suggest patch which provides for less restrictive syntax, 
since current syntax requirements rejects significant fraction of 
pages.

Affects sgmllib.py and markupbase.py.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 18:36:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 10:36:33 -0700
Subject: [Patches] [ python-Patches-565378 ] Expose _Py_ReleaseInternedStrings
Message-ID: <E17T4LJ-0000xy-00@usw-sf-web2.sourceforge.net>

Patches item #565378, was opened at 2002-06-06 11:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=565378&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Barry A. Warsaw (bwarsaw)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Expose _Py_ReleaseInternedStrings

Initial Comment:
An implementation of the idea expressed here:

http://mail.python.org/pipermail/python-dev/2002-June/025067.html

This exposes the clearing of the intern dictionary to
Python via gc.release_interns().  Patch includes doc
updates and a test case.


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-12 13:36

Message:
Logged In: YES 
user_id=12800

I'm closing this as rejected since others have more time and
energy to do this (i.e. murder intern strings :) right.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-08 06:02

Message:
Logged In: YES 
user_id=21627

Could not this get exposed as str.some_static_method?

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-07 13:51

Message:
Logged In: YES 
user_id=12800

I'm planning to, yes.  I may not get to it until early next
week though, so if someone is motivated to do it before
then, let me know (or steal the assignment).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-07 12:39

Message:
Logged In: YES 
user_id=6380

Barry, are you going to do an implementation of what we
decided on? If not, maybe unassign this and let someone else
submit one.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-06 12:03

Message:
Logged In: YES 
user_id=12800

Guido also suggests to move this sys instead of gc (although
gc isn't totally out of the question).  Maybe Neil has an
opinion one way or the other?

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-06 12:02

Message:
Logged In: YES 
user_id=12800

Yes, I know why this isn't right.  Notes:

- we probably want to incref the string pointed to by
ob_intern so that the reference is counted in the interned
string's refcount

- when the referent is freed, we'll need to decref the ob_intern

this addresses

http://mail.python.org/pipermail/python-dev/2002-June/025071.html


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-06 11:59

Message:
Logged In: YES 
user_id=6380

Um, that's totally unsafe. See
http://mail.python.org/pipermail/python-dev/2002-June/025071.html

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=565378&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 22:02:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 14:02:01 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17T7Y9-0002EE-00@usw-sf-web3.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 14:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 09:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 22:13:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 14:13:45 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17T7jV-0003z5-00@usw-sf-web4.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 17:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 17:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 12:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 16:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-27 00:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 21:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-21 00:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 17:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 15:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 16:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 20:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-05 02:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Fri Jul 12 23:27:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 15:27:13 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17T8sb-0005Li-00@usw-sf-web4.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 15:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 14:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 14:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 09:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Sat Jul 13 04:15:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 12 Jul 2002 20:15:12 -0700
Subject: [Patches] [ python-Patches-580869 ] Fix for seg fault on test_re on mac osx
Message-ID: <E17TDNI-0007sd-00@usw-sf-web1.sourceforge.net>

Patches item #580869, was opened at 2002-07-12 23:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580869&group_id=5470

Category: Tests
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Steven D. Majewski (sdm7g)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for seg fault on test_re on mac osx

Initial Comment:


    import resource
    soft, hard = resource.getrlimit(
resource.RLIMIT_STACK )
    resource.setrlimit( resource.RLIMIT_STACK, (1024 *
2048, hard))


is the python equivalent of the tcsh 'limit stack 2048'
and will
keep python from seg faulting on test_re . 

( maybe wrapped in a "if os.platform == 'darwin' : "  -- 
  are there any other systems that have this problem ? ) 

-- Steve Majewski


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580869&group_id=5470


From noreply@sourceforge.net  Sat Jul 13 16:53:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 13 Jul 2002 08:53:44 -0700
Subject: [Patches] [ python-Patches-580995 ] new version of Set class
Message-ID: <E17TPDM-0003Ga-00@usw-sf-web4.sourceforge.net>

Patches item #580995, was opened at 2002-07-13 17:53
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580995&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Alex Martelli (aleax)
Assigned to: Nobody/Anonymous (nobody)
Summary: new version of Set class

Initial Comment:
As per python-dev discussion on Sat 13 July 2002, 
subject
"Dict constructor".  A version of Greg Wilson's sandbox
Set class that avoids the trickiness of implicitly freezing
a set when __hash__ is called on it.  Rather, uses 
several classes: Set itself has no __hash__ and 
represents a
general, mutable set; BaseSet, its superclass, has all
functionality common to mutable and immutable sets; 
ImmutableSet also subclasses BaseSet and adds 
__hash__; a wrapper _TemporarilyImmutableSet wraps
a Set exposing only __hash__ (identical to that an 
ImmutableSet built from the Set would have) and __eq__ 
and __ne__ (delegated to the Set instance).

Set.add(self, x) attempts to call x=x._asImmutable() (if
AttributeError leaves x alone); Set._asImmutable(self)
returns ImmutableSet(self).
Membership test BaseSet.__contains__(self, x) attempt
to call x = x._asTemporarilyImmutable() (if AttributeError 
leaves x alone); Set._asTemporarilyImmutable(self) 
returns TemporarilyImmutableSet(self).

I've left Greg's code mostly alone otherwise except for
fixing bugs/obsolescent usage (e.g. dictionary rather than
dict) and making what were ValueError into TypeError 
(ValueError was doubtful earlier, is untenable now that
mutable and immutable sets are different types).  The
change in exceptions forced me to change the unit tests
in test_set.py, too, but I made no other changes nor
additions.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580995&group_id=5470


From noreply@sourceforge.net  Sun Jul 14 01:29:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 13 Jul 2002 17:29:07 -0700
Subject: [Patches] [ python-Patches-580411 ] move frame macros into ceval
Message-ID: <E17TXG7-0000tM-00@usw-sf-web2.sourceforge.net>

Patches item #580411, was opened at 2002-07-12 00:04
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neal Norwitz (nnorwitz)
Summary: move frame macros into ceval

Initial Comment:
There are some old macros in frameobject.h which are
only used in ceval.c.  These macros are not prefixed
with Py and some aren't used at all.

This patch:
 * removes all of the macros from frameobject.h
 * moves the necessary macros into ceval.c
 * gets rid of an extra level of macros
 * uses co alias instead of f->f_code

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-13 20:29

Message:
Logged In: YES 
user_id=33168

Checked in as:
 frameobject.h 2.35
 ceval.c 2.316

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-12 00:16

Message:
Logged In: YES 
user_id=31435

Nice!  Accepted and back to Neal.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580411&group_id=5470


From noreply@sourceforge.net  Sun Jul 14 20:23:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 14 Jul 2002 12:23:44 -0700
Subject: [Patches] [ python-Patches-581396 ] Canvas "select_item" always returns None
Message-ID: <E17Toy8-0000jJ-00@usw-sf-web2.sourceforge.net>

Patches item #581396, was opened at 2002-07-14 19:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470

Category: Tkinter
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Nobody/Anonymous (nobody)
Summary: Canvas "select_item" always returns None

Initial Comment:
bug in 2.1.3, 2.2.1 and CVS HEAD. One liner patch:

*** /usr/lib/python2.1/lib-tk/Tkinter.py.orig   Wed Jul
 3 17:04:28 2002 
--- /usr/lib/python2.1/lib-tk/Tkinter.py        Wed Jul
 3 17:04:31 2002 
*************** 
*** 2096,2100 **** 
      def select_item(self): 
          """Return the item which has the selection.""" 
!         self.tk.call(self._w, 'select', 'item') 
      def select_to(self, tagOrId, index): 
          """Set the variable end of a selection in
item TAGORID to INDEX.""" 
--- 2096,2100 ---- 
      def select_item(self): 
          """Return the item which has the selection.""" 
!         return self.tk.call(self._w, 'select', 'item') 
      def select_to(self, tagOrId, index): 
          """Set the variable end of a selection in
item TAGORID to INDEX.""" 
 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470


From noreply@sourceforge.net  Sun Jul 14 21:13:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 14 Jul 2002 13:13:19 -0700
Subject: [Patches] [ python-Patches-581414 ] info reader bug
Message-ID: <E17Tpk7-0005Mh-00@usw-sf-web1.sourceforge.net>

Patches item #581414, was opened at 2002-07-14 20:13
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581414&group_id=5470

Category: Documentation
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: info reader bug

Initial Comment:
Not really a bug in the python documentation, but
somewhat annoying:

The "Matching vs. Searching" Info node is unreachable
from the Info program (but is fine in Emacs's Info
mode).  This patch seems to fix it. 
This is the only occurrence, where the info reader
fails, so probably it could be addressed in the python
docs as a workaround. Forwarded the report to the info
maintainer.

2.1.3, 2.2.1 and HEAD.

--- Doc/lib/libre.tex.orig      Wed Jul  3 16:09:07 2002 
+++ Doc/lib/libre.tex   Wed Jul  3 16:09:15 2002 
@@ -389,7 +389,7 @@ 
 refer to numbered groups. 
  
  
-\subsection{Matching vs. Searching
\label{matching-searching}} 
+\subsection{Matching vs Searching
\label{matching-searching}} 
 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org} 
  
 Python offers two different primitive operations based
on regular 
 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581414&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 11:25:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 03:25:29 -0700
Subject: [Patches] [ python-Patches-401229 ] Optional memory profiler
Message-ID: <E17U32n-0002JV-00@usw-sf-web1.sourceforge.net>

Patches item #401229, was opened at 2000-08-19 08:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=401229&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 3
Submitted By: Vladimir Marangozov (marangoz)
Assigned to: Jeremy Hylton (jhylton)
Summary: Optional memory profiler

Initial Comment:
 

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-15 12:25

Message:
Logged In: YES 
user_id=21627

I'm closing this patch now. I believe the statistics
functions of Tim's recent pymalloc changes overlap in
functionalilty with this patch, and apparently, nobdody has
a real need for the feature.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 11:42

Message:
Logged In: YES 
user_id=21627

I still recommend to reject this patch.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-08-09 16:51

Message:
Logged In: YES 
user_id=31392

I had the impression that the feature was useful, but
haven't had any time to spend on it.  I'm not sure if
spending time on it before 2.2 is a good use of time or not.
 I'd rather keep this patch around as a reminder than close
it, but I'll mark it out of date and give it a low priority.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-06-04 09:02

Message:
Logged In: YES 
user_id=21627

The patch, in its current form, fails to apply (4 hunks 
fail). Also, the URL of the discussion of the patch 
changed to

http://mail.python.org/pipermail/python-dev/2000-August/008527.html

I recommend to reject this patch, since I cannot see what 
use the information it produces has to a Python developer.
If there is a desire to have the feature in Python, I'd 
volunteer to provide an updated patch.


----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-08-19 09:18

Message:
An optional memory profiler, which goes in tandem with the optional
object memory allocator (SourceForge patch #101104). The profiler was
introduced briefly on python-dev:
http://www.python.org/pipermail/python-dev/2000-August/015239.html

Applying both patches gives for me (screen dump):

~> patch -p1 < ../obmalloc-patch
patching file `Include/objimpl.h'
patching file `Objects/object.c'
patching file `Objects/obmalloc.c'
patching file `acconfig.h'
patching file `configure.in'
~> patch -p1 < ../memprof-patch
patching file `Include/pydebug.h'
patching file `Modules/Setup.config.in'
patching file `Modules/main.c'
patching file `Modules/memprof.c'
patching file `Python/pythonrun.c'
patching file `acconfig.h'
patching file `configure.in'

- Don't forget that you need to autoheader; autoconf;

This patch:

1) introduces a new --with-memprof configure option. Off by default.
2) introduced a Py_ProfileFlag and a "-p" Python option which starts
    the profiler in Py_Initialize() before any initializations, and stops it
    in Py_Finalize() after all finalizations.
3) contains a new Modules/memprof.c module. The inclusion of this file
   in the core is similar to the thread and GC modules (Setup.config.in)

The patch *can* be applied without the object allocator and it *does*
compile on request. However, it issues a warning that it won't profile
anything, because it can't be called (the profiler can't install its hooks).
Besides, it will refuse to start(). The point is that both the profiler and
the allocator are really optional.

Needs docs & tests :( The interface can be improved (just like everything
else) but the core functionality is there. It *is* useful for getting snapshots
of the minimum allocated (object) memory, at least. Some worthy points to
condifer, IMO, are listed in the TODO of memprof.c.

I am submitting this for testing, reviewing, comments and more ideas.
Overall, I think it is a BIG plus regarding Python's typical introspection.

Comments welcome. As usual, flames to /dev/null <wink>.

Status set straight to Postponed. Assigned to marangoz who's in charge of
opening it in due time, together with #101104.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=401229&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 11:26:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 03:26:36 -0700
Subject: [Patches] [ python-Patches-441528 ] MSVC Preprocessor
Message-ID: <E17U33s-0007sF-00@usw-sf-web5.sourceforge.net>

Patches item #441528, was opened at 2001-07-16 00:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=441528&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
>Resolution: Out of Date
Priority: 5
Submitted By: Tarn Weisner Burton (twburton)
Assigned to: Thomas Heller (theller)
Summary: MSVC Preprocessor

Initial Comment:
The attached script has a preprocessor method for the 
MSVC compiler.

Tarn
twburton@users.sf.net

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-15 12:26

Message:
Logged In: YES 
user_id=21627

If there is no update for this patch by September 1, it will
be rejected.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-11 08:32

Message:
Logged In: YES 
user_id=21627

Tarn, are you still interested in this patch? If so, please
submit a unified or context diff against the current CVS.

----------------------------------------------------------------------

Comment By: Tarn Weisner Burton (twburton)
Date: 2001-08-16 22:49

Message:
Logged In: YES 
user_id=21784

Oops, sorry.  Added test.py, i.e. run with

test.py config

Also the msvccompiler.py file has a print line which should
probably be removed


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2001-08-16 22:21

Message:
Logged In: YES 
user_id=11375

config.py doesn't actually do anything when run; it's a
copy of distutils.commands.config.py.  Was it intended to 
be a setup.py script?


----------------------------------------------------------------------

Comment By: Tarn Weisner Burton (twburton)
Date: 2001-07-18 20:59

Message:
Logged In: YES 
user_id=21784

Added test script config.py

To run: "python config.py config"

Also added msvccompiler.py which fixes a bug from my first 
post of pp.py, and is a modification of the current CVS 
version.

Tarn

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2001-07-18 09:58

Message:
Logged In: YES 
user_id=11105

Tarn, can you supply an example where this code would be 
executed?

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2001-07-16 15:40

Message:
Logged In: YES 
user_id=11375

Reassigning to Thomas, since I can't test anything with MSVC.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-07-16 15:23

Message:
Logged In: YES 
user_id=6380

Or should this go to Thomas Heller?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=441528&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 12:47:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 04:47:39 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17U4KJ-0003wf-00@usw-sf-web1.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 21:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 20:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 16:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 07:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 16:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 15:34:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 07:34:48 -0700
Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux
Message-ID: <E17U6w4-0000bF-00@usw-sf-web3.sourceforge.net>

Patches item #581705, was opened at 2002-07-16 00:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Rasjid Wilcox (rasjidw)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix to pty.spawn error on Linux

Initial Comment:
I submitted a bug report, id 581698 called 'pty.spawn -
wrong error caught'.

System: RedHat Linux 7.3, using Python2.

About a year ago, the final 'except' statement was
changed to catch IOError rather than just error. 
However, at least on my system, the os.read call raises
an OSError, not an IOError.  Therefore, the wrong error
type is now caught.

Patch attached.

Rasjid.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 15:38:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 07:38:47 -0700
Subject: [Patches] [ python-Patches-580331 ] xreadlines caching, file iterator
Message-ID: <E17U6zv-0005sj-00@usw-sf-web5.sourceforge.net>

Patches item #580331, was opened at 2002-07-11 17:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: xreadlines caching, file iterator

Initial Comment:
Calling f.xreadlines() multiple times returns the same 
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls 
the cached xreadlines object's next method.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 10:38

Message:
Logged In: YES 
user_id=6380

I posted some comments to python-dev.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 16:23:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 08:23:57 -0700
Subject: [Patches] [ python-Patches-575073 ] PyTRASHCAN slots deallocation
Message-ID: <E17U7hd-00075t-00@usw-sf-web5.sourceforge.net>

Patches item #575073, was opened at 2002-06-28 15:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575073&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Jonathan Hogg (jhogg)
Assigned to: Nobody/Anonymous (nobody)
Summary: PyTRASHCAN slots deallocation

Initial Comment:
This is an addition to the PyTRASHCAN macros to support
delayed deallocation of arbitrary objects (i.e., not
just builtin containers), and a modification to the
'clear_slots' routine to use these macros.

This patch fixes bug ID 574207, "Chained __slots__
dealloc segfault".

The solution is not ideal, but it appears to have
minimal impact.


----------------------------------------------------------------------

>Comment By: Jonathan Hogg (jhogg)
Date: 2002-07-15 15:23

Message:
Logged In: YES 
user_id=10036

Attaching a new version of this patch against the 2.3 HEAD
code (as of today).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575073&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 16:47:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 08:47:56 -0700
Subject: [Patches] [ python-Patches-581742 ] Alternative PyTRASHCAN subtype_dealloc
Message-ID: <E17U84q-0005uX-00@usw-sf-web4.sourceforge.net>

Patches item #581742, was opened at 2002-07-15 15:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581742&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Jonathan Hogg (jhogg)
Assigned to: Nobody/Anonymous (nobody)
Summary: Alternative PyTRASHCAN subtype_dealloc

Initial Comment:
This is an alternative to patch #575073 (PyTRASHCAN
slots deallocation) that wraps 'subtype_dealloc' in the
(very slightly altered) normal PyTRASHCAN macros.

This patch isn't meant to be pretty, it's just to
demonstrate another possible solution. I would expect
it to be worked on before being accepted. I'm sure
there must be a way to safely untrack the object at the
beginning.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581742&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 19:14:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 11:14:08 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17UAMK-0003eS-00@usw-sf-web1.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 02:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-15 14:14

Message:
Logged In: YES 
user_id=31435

Thanks for the detailed followup, Andrew!  I incorporated 
some of this info into XXXROUNDUP's comments.

Without either patch, the system malloc has to do two 
miserable things:  (1) find bigger and bigger memory areas 
very frequently; and, (2) interleaved with that, allocate 
gazillions of tiny blocks too.  #2 makes it difficult for the 
platform malloc to find free space contiguous to the blocks 
allocated for #1, unless it arranges to move them to "the 
end" of memory, or into their own memory segments.  As a 
result it's likely to do a copy on nearly every large-block 
realloc, and the code used to do a realloc on every 3rd new 
child.

The XXXROUNDUP patch addressed #1 by asking to grow 
blocks much less frequently; PyMalloc addresses #2 by 
getting the tiny blocks out of the platform malloc's hair.  If 
the platform malloc is saved from either one, it's job 
becomes much easier.

It would still be nice to switch the parser to using 
pymalloc.  There are still disasters lurking, because some 
platform malloc packages appear to take quadratic time 
when *free*ing gazillions of tiny blocks (they thrash trying 
to coalesce them into larger contiguous free blocks).  
pymalloc doesn't try to coalesce free blocks, so is reliably 
immune to this disease.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 07:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 06:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 02:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-07 17:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 02:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 21:52:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 13:52:26 -0700
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E17UCpW-0003jH-00@usw-sf-web4.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 21:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
Assigned to: Jeremy Hylton (jhylton)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-15 20:52

Message:
Logged In: YES 
user_id=31392

I'll take a look.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 15:46

Message:
Logged In: YES 
user_id=6380

Assigning to Jeremy in the hope that he can provide a review.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-24 03:03

Message:
Logged In: YES 
user_id=205865

The newer patch is generated against the latest CVS tree,
and it provides additional documentation.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-11 18:47

Message:
Logged In: YES 
user_id=205865

Sorry, been caught with a zillion of other things to do.
I'll try to reorganize it somehow and ask for opinions.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-11 18:42

Message:
Logged In: YES 
user_id=31392

misa-- any progress on this patch?


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 23:12

Message:
Logged In: YES 
user_id=6380

OK, thanks; I'll wait!

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-03-01 22:58

Message:
Logged In: YES 
user_id=205865

I will add documentation and show the intended usage.
urllib* doesn't deal with proxying over SSL (using CONNECT
instead of GET/POST). urllib* also use the compatibility
classes, HTTP/HTTPS, instead of HTTPConnection (this is not
an argument by itself).
Thanks for the suggestion.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 22:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 22:21:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 14:21:56 -0700
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E17UDI4-0000uW-00@usw-sf-web3.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 21:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
Assigned to: Jeremy Hylton (jhylton)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-15 21:21

Message:
Logged In: YES 
user_id=31392

The proposed classes seem useful enough, but I would like to
make several suggestions for the implementation.

- There are too many comments.  Comments should only be
added when 
   the intent of the code needs to be explained.  We
definitely don't need
   one comment for each line of code.  The comment in the
HTTPS proxy
   putrequest() is an example of a helpful comment.

- Just use a single underscore for private variables.

- Please use string methods instead of the string module.

- I don't understand the logic of switching the host/port
back and forth.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-15 20:52

Message:
Logged In: YES 
user_id=31392

I'll take a look.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 15:46

Message:
Logged In: YES 
user_id=6380

Assigning to Jeremy in the hope that he can provide a review.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-24 03:03

Message:
Logged In: YES 
user_id=205865

The newer patch is generated against the latest CVS tree,
and it provides additional documentation.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-11 18:47

Message:
Logged In: YES 
user_id=205865

Sorry, been caught with a zillion of other things to do.
I'll try to reorganize it somehow and ask for opinions.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-11 18:42

Message:
Logged In: YES 
user_id=31392

misa-- any progress on this patch?


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 23:12

Message:
Logged In: YES 
user_id=6380

OK, thanks; I'll wait!

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-03-01 22:58

Message:
Logged In: YES 
user_id=205865

I will add documentation and show the intended usage.
urllib* doesn't deal with proxying over SSL (using CONNECT
instead of GET/POST). urllib* also use the compatibility
classes, HTTP/HTTPS, instead of HTTPConnection (this is not
an argument by itself).
Thanks for the suggestion.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 22:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 22:37:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 14:37:37 -0700
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E17UDXF-0004VI-00@usw-sf-web4.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 16:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
Assigned to: Jeremy Hylton (jhylton)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Mihai Ibanescu (misa)
Date: 2002-07-15 17:37

Message:
Logged In: YES 
user_id=205865

- I agree about the comments. I'll make them reasonable.
- one underscore is fine
- I intended to have a patch that works with python 1.5, but
then again the module itself doesn't run with 1.5 anyway, so
good point.
- When you make a connection to a server through a proxy,
you have to connect to the proxy, but everything else should
be the same, i.e. the Host: field has to refer to the server
and so on. I wanted to reuse the code from _set_hostport,
which saves the host and port in self.host, self.port. Had
to do it twice, once for the proxy hostname, once for the
server's. _set_hostport takes care of the default port and
of the "hostname:port" syntax, which is convenient.

I'll put together a patched patch and upload it.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-15 17:21

Message:
Logged In: YES 
user_id=31392

The proposed classes seem useful enough, but I would like to
make several suggestions for the implementation.

- There are too many comments.  Comments should only be
added when 
   the intent of the code needs to be explained.  We
definitely don't need
   one comment for each line of code.  The comment in the
HTTPS proxy
   putrequest() is an example of a helpful comment.

- Just use a single underscore for private variables.

- Please use string methods instead of the string module.

- I don't understand the logic of switching the host/port
back and forth.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-15 16:52

Message:
Logged In: YES 
user_id=31392

I'll take a look.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 11:46

Message:
Logged In: YES 
user_id=6380

Assigning to Jeremy in the hope that he can provide a review.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-23 23:03

Message:
Logged In: YES 
user_id=205865

The newer patch is generated against the latest CVS tree,
and it provides additional documentation.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-06-11 14:47

Message:
Logged In: YES 
user_id=205865

Sorry, been caught with a zillion of other things to do.
I'll try to reorganize it somehow and ask for opinions.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-11 14:42

Message:
Logged In: YES 
user_id=31392

misa-- any progress on this patch?


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 18:12

Message:
Logged In: YES 
user_id=6380

OK, thanks; I'll wait!

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-03-01 17:58

Message:
Logged In: YES 
user_id=205865

I will add documentation and show the intended usage.
urllib* doesn't deal with proxying over SSL (using CONNECT
instead of GET/POST). urllib* also use the compatibility
classes, HTTP/HTTPS, instead of HTTPConnection (this is not
an argument by itself).
Thanks for the suggestion.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 22:43:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 14:43:58 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UDdO-0007HE-00@usw-sf-web5.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Tim Peters (tim_one)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 22:44:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 14:44:48 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UDeC-0007KR-00@usw-sf-web1.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Tim Peters (tim_one)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 23:21:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 15:21:59 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UEEB-0008DP-00@usw-sf-web5.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Tim Peters (tim_one)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Mon Jul 15 23:33:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 15:33:25 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17UEPF-0008MM-00@usw-sf-web1.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 01:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:33

Message:
Logged In: YES 
user_id=33168

Sorry, I forgot about this patch.
I just tested on Linux (RedHat 7.2).
No problems, all expected tests successful.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 20:41

Message:
Logged In: YES 
user_id=14198

My patch is after Martin's so hopefully I have the macros
correct (or at least haven't regressed anything of his!)

DL_*PORT still exists, but is deprecated.  Eventually every
header will change, but for now DL_*PORT still works as before.

And yes, finding autoconf-2.5.3 for my cygwin and linux
platforms is what took 1/2 the time of getting this patch
together :)

Another report of success on Linux would be great!  To date,
I have not heard of a single person trying this patch on any
platform.

Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-05 14:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 02:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 08:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-03 21:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 06:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-01 21:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 14:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-23 23:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 01:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 00:41:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 16:41:46 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UFTO-0001Cw-00@usw-sf-web1.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Tim Peters (tim_one)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 19:41

Message:
Logged In: YES 
user_id=6380

Everything that has a tp_iternext slot also defines a next()
method, but those next() methods are never used; they are
overridden by wrap_next(). This is not new, this has been
the case since 2.2! So you're right, enum works fine, but
its next() method is pointless, and I'll remove it -- just
like the others you mentioned. (except cStringIO, which
simply calls PyCallIterm_New()).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 06:26:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 15 Jul 2002 22:26:26 -0700
Subject: [Patches] [ python-Patches-580331 ] xreadlines caching, file iterator
Message-ID: <E17UKqw-0006IM-00@usw-sf-web5.sourceforge.net>

Patches item #580331, was opened at 2002-07-11 21:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: xreadlines caching, file iterator

Initial Comment:
Calling f.xreadlines() multiple times returns the same 
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls 
the cached xreadlines object's next method.


----------------------------------------------------------------------

>Comment By: Oren Tirosh (orenti)
Date: 2002-07-16 05:26

Message:
Logged In: YES 
user_id=562624

Now invalidates cache on a seek.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 14:38

Message:
Logged In: YES 
user_id=6380

I posted some comments to python-dev.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 17:43:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 09:43:44 -0700
Subject: [Patches] [ python-Patches-509975 ] make python-mode play nice with gdb
Message-ID: <E17UVQO-0005Lb-00@usw-sf-web1.sourceforge.net>

Patches item #509975, was opened at 2002-01-28 20:32
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=509975&group_id=5470

Category: Demos and tools
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Alex Coventry (alex_coventry)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: make python-mode play nice with gdb

Initial Comment:
if you run gdb (and presumably other debuggers) while
python-mode is loaded, the little arrow it uses to 
indicate the current position in the source code fails
to appear.  this is because the comint hook
py-pdbtrack-track-stack-file wipes it out regardless of
whether the current buffer process comes from python.

hth.
alex

----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-16 12:43

Message:
Logged In: YES 
user_id=12800

This patch doesn't work for a use case that I've found to be
very common, namely adding "import pdb ; pdb.set_trace()" in
the code where you want to start tracing, and then just
running the program from the shell buffer.

Ken turned me on to this idiom and it's really powerful. 
The test for process-command matching exactly
py-python-command or py-jpython-command breaks this.

I'm not sure patch 567468 is much better, but for different
reasons.

----------------------------------------------------------------------

Comment By: Alex Coventry (alex_coventry)
Date: 2002-02-03 16:18

Message:
Logged In: YES 
user_id=49686

sorry, somehow failed to include the diff :)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=509975&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 17:53:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 09:53:15 -0700
Subject: [Patches] [ python-Patches-567468 ] A different patch for python-mode vs gdb
Message-ID: <E17UVZb-0005ZC-00@usw-sf-web1.sourceforge.net>

Patches item #567468, was opened at 2002-06-11 12:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=567468&group_id=5470

Category: Demos and tools
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 6
Submitted By: Jason Merrill (jason_merrill)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: A different patch for python-mode vs gdb

Initial Comment:
Patch 509975 fixes the conflict between gdb-mode and
python-mode by checking whether the current process is
a python process.  My patch fixes it more simply, by
only clearing the overlay arrow if we were the ones who
set it.

I'd be happy with either patch.


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-16 12:53

Message:
Logged In: YES 
user_id=12800

I've rejected 509975 because it doesn't play nice when
you're pdb tracking from the shell (see comments in that patch).

I'm not sure this patch works correctly either, but for a
different reason: it doesn't actually work for me!

If I add "import pdb; pdb.set_trace()" to a file and then
execute the file from the shell buffer, I see the overlay
arrow.  If I then switch to a gdb debugging a C program and
hit "next", the overlay arrow in the .py buffer disappears.

This seems like a tricky problem and I don't have a good
solution, but I think I have to reject this patch too.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=567468&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 18:02:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 10:02:03 -0700
Subject: [Patches] [ python-Patches-567468 ] A different patch for python-mode vs gdb
Message-ID: <E17UVi7-0002Uw-00@usw-sf-web3.sourceforge.net>

Patches item #567468, was opened at 2002-06-11 09:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=567468&group_id=5470

Category: Demos and tools
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 6
Submitted By: Jason Merrill (jason_merrill)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: A different patch for python-mode vs gdb

Initial Comment:
Patch 509975 fixes the conflict between gdb-mode and
python-mode by checking whether the current process is
a python process.  My patch fixes it more simply, by
only clearing the overlay arrow if we were the ones who
set it.

I'd be happy with either patch.


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2002-07-16 10:02

Message:
Logged In: NO 

The problem my patch was intended to fix is that currently
just loading python-mode.el (as happens by default under Red
Hat 7.3) breaks gdb-mode.  With my patch, it works fine.

emacs only supports one overlay arrow at a time; if you hit
'next' in gdb, gdb-mode will set the overlay arrow, which
means that it will no longer be set in the python buffer. 
This may not be ideal behavior, but it's a limitation of
emacs, not a bug in my patch.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-16 09:53

Message:
Logged In: YES 
user_id=12800

I've rejected 509975 because it doesn't play nice when
you're pdb tracking from the shell (see comments in that patch).

I'm not sure this patch works correctly either, but for a
different reason: it doesn't actually work for me!

If I add "import pdb; pdb.set_trace()" to a file and then
execute the file from the shell buffer, I see the overlay
arrow.  If I then switch to a gdb debugging a C program and
hit "next", the overlay arrow in the .py buffer disappears.

This seems like a tricky problem and I don't have a good
solution, but I think I have to reject this patch too.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=567468&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 18:30:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 10:30:33 -0700
Subject: [Patches] [ python-Patches-567468 ] A different patch for python-mode vs gdb
Message-ID: <E17UW9h-0006TT-00@usw-sf-web1.sourceforge.net>

Patches item #567468, was opened at 2002-06-11 12:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=567468&group_id=5470

Category: Demos and tools
Group: Python 2.3
>Status: Open
Resolution: Rejected
Priority: 6
Submitted By: Jason Merrill (jason_merrill)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: A different patch for python-mode vs gdb

Initial Comment:
Patch 509975 fixes the conflict between gdb-mode and
python-mode by checking whether the current process is
a python process.  My patch fixes it more simply, by
only clearing the overlay arrow if we were the ones who
set it.

I'd be happy with either patch.


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-16 13:30

Message:
Logged In: YES 
user_id=12800

Hmm, it must work differently in emacs than in XEmacs (which
is what I use).  In a vanilla Emacs 21.2.1 I can't get the
overlay arrow to work even without python-mode.el loaded, so
I'll have to take your word for it.

In XEmacs, I definitely do get two overlay arrows, one in
the C buffer and one in the python-mode buffer.  As I step
through the python program, the C arrow stays nicely visible
and highlighted.  As I step through gdb though, the python
overlay arrow disappears.

Your patch makes no difference to me and I can't get overlay
arrow working at all in Emacs, so I suppose the patch is
benign.  I'll reopen it but I'd like confirmation from some
other Emacs user that this fixes the problem in that editor.
 Alternatively, maybe I should just apply it and worry about
it if people complain.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2002-07-16 13:02

Message:
Logged In: NO 

The problem my patch was intended to fix is that currently
just loading python-mode.el (as happens by default under Red
Hat 7.3) breaks gdb-mode.  With my patch, it works fine.

emacs only supports one overlay arrow at a time; if you hit
'next' in gdb, gdb-mode will set the overlay arrow, which
means that it will no longer be set in the python buffer. 
This may not be ideal behavior, but it's a limitation of
emacs, not a bug in my patch.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-16 12:53

Message:
Logged In: YES 
user_id=12800

I've rejected 509975 because it doesn't play nice when
you're pdb tracking from the shell (see comments in that patch).

I'm not sure this patch works correctly either, but for a
different reason: it doesn't actually work for me!

If I add "import pdb; pdb.set_trace()" to a file and then
execute the file from the shell buffer, I see the overlay
arrow.  If I then switch to a gdb debugging a C program and
hit "next", the overlay arrow in the .py buffer disappears.

This seems like a tricky problem and I don't have a good
solution, but I think I have to reject this patch too.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=567468&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 19:50:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 11:50:15 -0700
Subject: [Patches] [ python-Patches-534304 ] PEP 263 Implementation
Message-ID: <E17UXOp-0006tj-00@usw-sf-web5.sourceforge.net>

Patches item #534304, was opened at 2002-03-24 08:52
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: SUZUKI Hisao (suzuki_hisao)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 263 Implementation

Initial Comment:
This is a sample implementation of PEP 263 phase 2.

This implementation behaves just as normal Python does
if no other coding hints are given.  Thus it does not
hurt anyone who uses Python now.  Note that it is
strictly compatible with the PEP in that every program
valid in the PEP is also valid in this implementation.

This implementation also accepts files in UTF-16 with
BOM.  They are read as UTF-8 internally.  Please try
"utf16sample.py" included.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-16 14:50

Message:
Logged In: YES 
user_id=33168

I reviewed the patch.  I don't like the usage of enc (and
str to a lesser extent).  In particular, there is an
encoding field which is generally used.  enc is used as a
temporary from the callback.  I don't have a solution, so
perhaps it would be best to doc the purpose, usage and
interaction of enc & str.

There are some differences between the standard formatting
and that used in the patch.   return on same line as if
among others.  But these aren't too bad.  Although I don't
love the line do t++; while (...);.

I didn't see any problems with the patch.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-09 09:42

Message:
Logged In: YES 
user_id=21627

I have now updated this patch to the current CVS, and to be
a complete PEP 263 implementation; it will issue warnings
when it finds non-ASCII characters but no encoding declaration.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-26 15:41

Message:
Logged In: YES 
user_id=21627

I've updated the PEP to describe how this approach should be
used: Python 2.3 still should generate warnings only for
using non-ASCII without declared encoding. I, too, hope that
Mr Suzuki will update the patch to match the PEP, and for
the CVS tree.

As for supporting UTF-16: The stream reader currently has
the .readline method disabled, since it won't work reliable
for little-endian. So I think this should be an undocumented
feature at the moment; I see no other technical problems
with the approach taken in the patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-04-23 17:26

Message:
Logged In: YES 
user_id=6380

I haven't looked at this very carefully, but it looks like
it's well thought-out.

Suzuki, can you prepare a patch relative to current CVS?  I
get several patch failures now. (Fortunately I have a
checkout of 2.2 so I can still review and test the patch.)
I don't know what the patch failures are about (haven't
investigated) but imagine it might have to do with the PEP
279 (universal newlines) changes checked in by Jack Jansen,
which replaces the tokenizer's fgets() calls with calls to
Py_UniversalNewlineFgets().

Also, I can't read the README file (it's in Japanese :-).
What is the expected output from the samples? For me,
sjis_sample.py gives SyntaxError: 'unknown encoding'

Martin, I'm unclear of how you intend to use this code. Do
you intend to go straight to phase 2 of the PEP using this
patch? Or do you intend to implement phase 1 of the PEP by
modifying this code?

Also, does the PEP describe the UTF-16 support as
implemented by Suziki's patch?


----------------------------------------------------------------------

Comment By: SUZUKI Hisao (suzuki_hisao)
Date: 2002-03-31 11:16

Message:
Logged In: YES 
user_id=495142

Thank you for your review.
Now 1. and 3. are fixed, and 2. is improved.
(4. is not true.)


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-30 06:27

Message:
Logged In: YES 
user_id=6656

Not going into 2.2.x.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-25 08:23

Message:
Logged In: YES 
user_id=21627

The patch looks good, but needs a number of improvements.

1. I have problems building this code. When trying to build
pgen, I get an error message of

Parser/parsetok.c: In function `parsetok':
Parser/parsetok.c:175: `encoding_decl' undeclared

The problem here is that graminit.h hasn't been built yet,
but parsetok refers to the symbol.

2. For some reason, error printing for incorrect encodings
does not work - it appears that it prints the wrong line in
the traceback.

3. The escape processing in Unicode literals is incorrect.
For example, u"\<non-ascii character>" should denote only
the non-ascii character. However, your implementation
replaces the non-ASCII character with \u<hex>, resulting in
\u<hex>, so the first backslash unescapes the second one.

4. I believe the escape processing in byte strings is also
incorrect for encodings that allow \ in the second byte.
Before processing escape characters, you convert back into
the source encoding. If this produces a backslash character,
escape processing will misinterpret that byte as an escape
character.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 22:34:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 14:34:30 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17UZxm-0005Wq-00@usw-sf-web4.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 14:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 15:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 14:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 14:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 09:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 22:43:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 14:43:18 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17Ua6I-0005j0-00@usw-sf-web4.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-16 17:43

Message:
Logged In: YES 
user_id=31435

Since you're checking this in, I may as well Accept it and 
assign it back to you <wink>.  +1 from me anyway.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 19:41

Message:
Logged In: YES 
user_id=6380

Everything that has a tp_iternext slot also defines a next()
method, but those next() methods are never used; they are
overridden by wrap_next(). This is not new, this has been
the case since 2.2! So you're right, enum works fine, but
its next() method is pointless, and I'll remove it -- just
like the others you mentioned. (except cStringIO, which
simply calls PyCallIterm_New()).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 23:04:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 15:04:48 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UaR6-0002Qr-00@usw-sf-web5.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 18:04

Message:
Logged In: YES 
user_id=6380

Yeah, it's all checked in. :-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 17:43

Message:
Logged In: YES 
user_id=31435

Since you're checking this in, I may as well Accept it and 
assign it back to you <wink>.  +1 from me anyway.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 19:41

Message:
Logged In: YES 
user_id=6380

Everything that has a tp_iternext slot also defines a next()
method, but those next() methods are never used; they are
overridden by wrap_next(). This is not new, this has been
the case since 2.2! So you're right, enum works fine, but
its next() method is pointless, and I'll remove it -- just
like the others you mentioned. (except cStringIO, which
simply calls PyCallIterm_New()).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Tue Jul 16 23:25:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 15:25:56 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UalY-0001Zj-00@usw-sf-web3.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-16 18:25

Message:
Logged In: YES 
user_id=33168

I'm not sure if these are problems or not, but just to be
safe... There are 2 Modules/ that still have iterators that
aren't tested:  xreadlinesmodule.c and _hotshot.c.  

I missed xreadlines the first time, so I'm not sure if it's
been reviewed.
And I thought Guido said hotshot was broken here:
http://mail.python.org/pipermail/python-dev/2002-July/026613.html.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 18:04

Message:
Logged In: YES 
user_id=6380

Yeah, it's all checked in. :-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 17:43

Message:
Logged In: YES 
user_id=31435

Since you're checking this in, I may as well Accept it and 
assign it back to you <wink>.  +1 from me anyway.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 19:41

Message:
Logged In: YES 
user_id=6380

Everything that has a tp_iternext slot also defines a next()
method, but those next() methods are never used; they are
overridden by wrap_next(). This is not new, this has been
the case since 2.2! So you're right, enum works fine, but
its next() method is pointless, and I'll remove it -- just
like the others you mentioned. (except cStringIO, which
simply calls PyCallIterm_New()).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 02:17:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 18:17:26 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17UdRW-0005rz-00@usw-sf-web5.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 21:17

Message:
Logged In: YES 
user_id=6380

There's some reworking of xreadlines in the pipeline in
another patch (580386); I'll make sure to check this part too.

Hotshot is a little harder, but Fred told me what I can do.

Reopening as reminder.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-16 18:25

Message:
Logged In: YES 
user_id=33168

I'm not sure if these are problems or not, but just to be
safe... There are 2 Modules/ that still have iterators that
aren't tested:  xreadlinesmodule.c and _hotshot.c.  

I missed xreadlines the first time, so I'm not sure if it's
been reviewed.
And I thought Guido said hotshot was broken here:
http://mail.python.org/pipermail/python-dev/2002-July/026613.html.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 18:04

Message:
Logged In: YES 
user_id=6380

Yeah, it's all checked in. :-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 17:43

Message:
Logged In: YES 
user_id=31435

Since you're checking this in, I may as well Accept it and 
assign it back to you <wink>.  +1 from me anyway.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 19:41

Message:
Logged In: YES 
user_id=6380

Everything that has a tp_iternext slot also defines a next()
method, but those next() methods are never used; they are
overridden by wrap_next(). This is not new, this has been
the case since 2.2! So you're right, enum works fine, but
its next() method is pointless, and I'll remove it -- just
like the others you mentioned. (except cStringIO, which
simply calls PyCallIterm_New()).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 02:33:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 18:33:21 -0700
Subject: [Patches] [ python-Patches-580331 ] xreadlines caching, file iterator
Message-ID: <E17Udgv-000800-00@usw-sf-web1.sourceforge.net>

Patches item #580331, was opened at 2002-07-11 17:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: xreadlines caching, file iterator

Initial Comment:
Calling f.xreadlines() multiple times returns the same 
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls 
the cached xreadlines object's next method.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 21:33

Message:
Logged In: YES 
user_id=6380

I'm reviewing this and will check it in, or something like
it (probably).

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-07-16 01:26

Message:
Logged In: YES 
user_id=562624

Now invalidates cache on a seek.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 10:38

Message:
Logged In: YES 
user_id=6380

I posted some comments to python-dev.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 03:43:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 16 Jul 2002 19:43:50 -0700
Subject: [Patches] [ python-Patches-580386 ] uncaught TypeError exception in sre
Message-ID: <E17Uen8-0000hn-00@usw-sf-web1.sourceforge.net>

Patches item #580386, was opened at 2002-07-11 22:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
Resolution: Duplicate
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Fredrik Lundh (effbot)
Summary: uncaught TypeError exception in sre

Initial Comment:
>From c.l.p on 9 July, Kevin Altis reported that:

    re.compile('([a-')

Produces an uncaught TypeError from compilation.

This patch catches the TypeError in _compile().

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-12 07:07

Message:
Logged In: YES 
user_id=38376

this is same as bug #545855, and should be fixed inside
the SRE parser (afaik, it has been, in the SLAB master
repository).

as for the extra try/except: this is to shield ordinary users
from 20-level tracebacks exposing irrelevant implementation
details .  if you make a mistake in an RE, you want to know
that, but you probably don't care about exactly where in the
parser or compiler internals the interpreter happens to be
when that mistake was discovered...  (this pattern, along
with the "add a comment on the raise line, to provide extra
hints for a human reader" idiom, are pretty common in
Python libraries). /F

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-11 23:05

Message:
Logged In: YES 
user_id=33168

I wonder if the same change must be made in _compile_repl().

I don't see the benefit of the try/except clause as it is:

  try:
    p = parse...
  except error, v:
    raise error, v
Isn't that just:  p = parse...

This probably also should be backported to 2.2

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580386&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 14:39:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 06:39:20 -0700
Subject: [Patches] [ python-Patches-572031 ] AUTH method LOGIN for smtplib
Message-ID: <E17Up1U-0005Eq-00@usw-sf-web5.sourceforge.net>

Patches item #572031, was opened at 2002-06-21 12:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572031&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: AUTH method LOGIN for smtplib

Initial Comment:
Unfortunately, my original SMTP auth patch doesn't work
so well in real life. There are two methods to
advertise the available auth methods for SMTP servers:

old-style: AUTH=method1 method2 ...
RFC style: AUTH method1 method2

Microsoft's MUAs are b0rken in that they only
understand the old-style method. That's why most SMTP
servers are configured to advertise their
authentication methods in old-style _and_ new style.
There are also some especially broken SMTP servers like
old M$ Exchange servers that only show their auth
methods via the old style.

Also the (sadly but true) very widely used M$ Exchange
server only supports the LOGIN auth method (I have to
use that thing at work, that's why I came up with this
patch). Exchange also supports some other proprietary
auth methods (NTLM, ...), but we needn't care about these.

My argument is that the Python SMTP AUTH support will
get a lot more useful to people if we also support

1) the old-style AUTH= advertisement
2) the LOGIN auth method, which, although not
standardized via RFCs and originally invented by
Netscape, is still in wide use, and for some servers
the only method to use them, so we should support it

Please note that in the current implementation, if a
server uses the old-style AUTH= method, our SMTP auth
support simply breaks because of the esmtp_features
parsing.

I'm randomly assigning this patch to Barry, because
AFAIK he knows a lot about email handling. Assign
around as you please :-)


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-17 15:39

Message:
Logged In: YES 
user_id=21627

That existing SMTP servers announce LOGIN only in the
old-style header is a good reason to support those as well;
I hence recommend that this patch is applied.

Microsoft is, strictly speaking, conforming to the RFC by
*not* reporting LOGIN in the AUTH header: only registered
SASL mechanism can be announced there, and LOGIN is not
registered; see

http://www.iana.org/assignments/sasl-mechanisms


----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-07-01 00:34

Message:
Logged In: YES 
user_id=163326

Updated patch. Changes to the previous patch:

- Use email.base64MIME.encode
  to get rid of the added
  newlines.
- Merge old and RFC-style auth methods
  in self.smtp_features instead of
  parsing old-style auth lines
  seperately.
- Removed example line for changing auth
  method priorities (we won't list all
  permutations of auth methods ;-)
- Removed superfluous logging call of
  chosen auth method.
- Moved comment about SMTP features
  syntax into the right place again.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-30 23:14

Message:
Logged In: YES 
user_id=163326

Martin,
the reason why we need to take into account both old and
RFC-style auth
advertisement is that there are some smtp servers, which
advertise different
auth mechanisms in the old vs. RFC-style line. In
particular, the MS Exchange
server that I have to use at work and I think that this is
even the default
configuration of Exchange 2000. In my case, it advertises
its LOGIN method only
in the AUTH= line.

I'll shortly upload a patch that takes this into account.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-30 18:20

Message:
Logged In: YES 
user_id=21627

I still cannot see why support for the old-style AUTH lines
is necessary. If all SMTPds announce their supported
mechanisms with both syntaxes, why is it then necessary to
even look at the old syntax?

I'm all for adding support for the LOGIN method.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-30 17:59

Message:
Logged In: YES 
user_id=12800

Martin, (some? most?) MUAs post messages by talking directly
to their outgoing SMTPd, so that's probably why Gerhard
mentions it.

On the issue of base64 issue, see the comment in bug
#552605, which I just took assignment of.  I'll deal with
both these bug reports soon.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-30 17:41

Message:
Logged In: YES 
user_id=21627

I cannot understand why the behaviour of MS MUAs is relevant
here at all; smtplib only talks to MTAs (or MSAs).

If MTAs advertise the AUTH extension in the new syntax in
addition to the old syntax, why is it not good to just
ignore the old advertisement? Can you point to  a specific
software package (ideally even a specific host) which fails
to interact with the current smtplib correctly?

----------------------------------------------------------------------

Comment By: Jason R. Mastaler (jasonrm)
Date: 2002-06-22 05:53

Message:
Logged In: YES 
user_id=85984

A comment on the old-style advertisement.

You say that Microsoft's MUAs only understand the
old-style method.  I haven't found this to be the case.

tmda-ofmipd is an outgoing SMTP proxy that supports
SMTP authentication, and I only use the RFC style
advertisement.  This works perfectly well with MS
clients like Outlook 2000, and Outlook Express 5.
Below is an example of what the advertisement looks
like.

BTW, no disagreement about supporting the old-style
advertisement in smtplib, as I think it's prudent, just 
making a point.

# telnet aguirre 8025
Trying 172.18.3.5...
Connected to aguirre.la.mastaler.com.
Escape character is '^]'.
220 aguirre.la.mastaler.com ESMTP tmda-ofmipd
EHLO aguirre.la.mastaler.com
250-aguirre.la.mastaler.com
250 AUTH LOGIN CRAM-MD5 PLAIN
QUIT
221 Bye
Connection closed by foreign host.


----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-21 12:43

Message:
Logged In: YES 
user_id=163326

This also includes a slightly modified version of patch #552605.

Even better would IMO be to add an additional parameter to
base64.encode* and the corresponding binascii functions that
avoids the insertion of newline characters.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572031&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 17:51:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 09:51:39 -0700
Subject: [Patches] [ python-Patches-581944 ] StopIteration should be a sink state
Message-ID: <E17Us1b-0004so-00@usw-sf-web3.sourceforge.net>

Patches item #581944, was opened at 2002-07-15 17:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: StopIteration should be a sink state

Initial Comment:
Here's a patch that fixes al known (to me :-)
occurrences of iterators in the core that may continue
to return values from next() after having once raised
StopIteration.

Note that the patch also removes various unused next()
method implementations; the type system provides a
next() method as a wrapper when tp_iternext is defined
in a type object.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-17 12:51

Message:
Logged In: YES 
user_id=6380

Hotshot changes checked in.

The xreadlines stuff will be discussed in patch 580386.

So closing this again.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 21:17

Message:
Logged In: YES 
user_id=6380

There's some reworking of xreadlines in the pipeline in
another patch (580386); I'll make sure to check this part too.

Hotshot is a little harder, but Fred told me what I can do.

Reopening as reminder.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-16 18:25

Message:
Logged In: YES 
user_id=33168

I'm not sure if these are problems or not, but just to be
safe... There are 2 Modules/ that still have iterators that
aren't tested:  xreadlinesmodule.c and _hotshot.c.  

I missed xreadlines the first time, so I'm not sure if it's
been reviewed.
And I thought Guido said hotshot was broken here:
http://mail.python.org/pipermail/python-dev/2002-July/026613.html.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 18:04

Message:
Logged In: YES 
user_id=6380

Yeah, it's all checked in. :-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 17:43

Message:
Logged In: YES 
user_id=31435

Since you're checking this in, I may as well Accept it and 
assign it back to you <wink>.  +1 from me anyway.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 19:41

Message:
Logged In: YES 
user_id=6380

Everything that has a tp_iternext slot also defines a next()
method, but those next() methods are never used; they are
overridden by wrap_next(). This is not new, this has been
the case since 2.2! So you're right, enum works fine, but
its next() method is pointless, and I'll remove it -- just
like the others you mentioned. (except cStringIO, which
simply calls PyCallIterm_New()).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:21

Message:
Logged In: YES 
user_id=33168

I thought enum might need patching.
But it worked fine with the patch.  Shucks. :-)

Do xreadlines, cStringIO, or hotshot need patching?
Those three seem to use iterators and could be a problem.

Everything worked fine for me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 17:44

Message:
Logged In: YES 
user_id=6380

Oops, here's the file...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581944&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 17:57:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 09:57:10 -0700
Subject: [Patches] [ python-Patches-552161 ] Py_AddPendingCall doesn't unlock on fail
Message-ID: <E17Us6w-0004zD-00@usw-sf-web3.sourceforge.net>

Patches item #552161, was opened at 2002-05-03 23:18
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552161&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Daniel Dunbar (danieldunbar)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Py_AddPendingCall doesn't unlock on fail

Initial Comment:
ceval.c:Py_AddPendingCall doesn't unlock if it
fails because the queue is full.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-17 12:57

Message:
Logged In: YES 
user_id=6380

Sure.  Checked in as ceval.c 2.320.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 09:57

Message:
Logged In: YES 
user_id=45365

I came across this one when browsing through the patches, it seems to have caught noones attention yet. Assigning it to Guido as he wrote the addpendig stuff (the patch looks benign to me).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552161&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 18:50:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 10:50:24 -0700
Subject: [Patches] [ python-Patches-580331 ] xreadlines caching, file iterator
Message-ID: <E17UswS-0005wH-00@usw-sf-web3.sourceforge.net>

Patches item #580331, was opened at 2002-07-11 17:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
>Priority: 3
Submitted By: Oren Tirosh (orenti)
Assigned to: Guido van Rossum (gvanrossum)
Summary: xreadlines caching, file iterator

Initial Comment:
Calling f.xreadlines() multiple times returns the same 
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls 
the cached xreadlines object's next method.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-17 13:50

Message:
Logged In: YES 
user_id=6380

Alas, there's a fatal flaw. The file object and the
xreadlines object now both have pointers to each other,
creating an unbreakable cycle (since neither participates in
GC). Weak refs can't be used to resolve this dilemma. I
personally think that's enough to just stick with the status
quo (I was never more than +0 on the idea of making the file
an interator anyway). But I'll leave it to Oren to come up
with another hack (please use this same SF patch).

Oren, if you'd like to give up, please say so and I'll close
the item in a jiffy. In fact, I positively encourage you to
give up. But I don't expect you to take this offer. :-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 21:33

Message:
Logged In: YES 
user_id=6380

I'm reviewing this and will check it in, or something like
it (probably).

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-07-16 01:26

Message:
Logged In: YES 
user_id=562624

Now invalidates cache on a seek.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 10:38

Message:
Logged In: YES 
user_id=6380

I posted some comments to python-dev.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 18:51:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 10:51:24 -0700
Subject: [Patches] [ python-Patches-575224 ] dict(seqn, value)
Message-ID: <E17UsxQ-0005y8-00@usw-sf-web3.sourceforge.net>

Patches item #575224, was opened at 2002-06-28 19:00
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575224&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Guido van Rossum (gvanrossum)
Summary: dict(seqn, value)

Initial Comment:
Have the dict() constructor accept a pair of 
arguments, a sequence of keys and a constant value. 
Addresses the common task of initializing dictionary 
elements to a constant value. Useful for building fast 
membership tests and for quickly (C-speed) 
eliminating duplicates in a sequence.  Is faster, more 
flexible, and clearer than:
   d = {}
   map(d.__setitem__, seqn, [])

Examples:
  uniq = dict(seqn,True).keys()  # eliminate duplicates
  termwords = dict('End Quit Stop Abort'.split(), True)
  if lexeme in termwords:  sys.exit(0)
  absences = dict('Tom Dick Harry'.split(), 0)

Patch includes source, docs, and unittest.  Also 
includes a minor change to shlex.py showing how the 
builtin can cleanly update existing code to achieve an 
order of magnitude performance boost (classifying 
characters is the most common operation in shlex).

Summary of discussion on py-dev:
At Walter and Barry's suggestion, the value was 
allowed to take any value (I initially used None). At 
Tim's suggestion, I went to an explicit two argument 
form to avoid ambiguity. If we ever get sets, Timbot 
thinks that they ought to be the tool of choice for two 
of the above use cases. Jack Jansen likes the tool 
and wants to go further and warn of inefficient 
searching when 'in' is used with sequences giving O(n) 
search speed when the could have O(1). The F/bot 
and Steve Holden poked at me for proposing 
something (speed and clarity aside) that can already 
be handled using existing constructs and Dave 
Abrahams disagreed with them.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-17 13:51

Message:
Logged In: YES 
user_id=6380

Rejecting. The dict() constructor is already overloaded to
the brim.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 14:08

Message:
Logged In: YES 
user_id=38376

The "obvious" other way to use a 2-argument to dict() would 
be dict(d.keys(), d.values()).  Not sure what's more common, 
though...

(and for the record, I'd prefer a separate "set" 
type/constructor, even if it's basically just a dict without some 
of the methods)

</F>

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-06-30 15:47

Message:
Logged In: YES 
user_id=80475

I'm away from the computer for the next five weeks.  Oren 
Tirosh will champion this patch from here forward.  He 
can lead the discussion and made any requested 
modifications.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575224&group_id=5470


From noreply@sourceforge.net  Wed Jul 17 19:40:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 11:40:41 -0700
Subject: [Patches] [ python-Patches-552605 ] Fix broken smtplib.login()
Message-ID: <E17Utj7-0003ps-00@usw-sf-web5.sourceforge.net>

Patches item #552605, was opened at 2002-05-05 20:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552605&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew Rucker Jones (arjones)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Fix broken smtplib.login()

Initial Comment:
RFC 2554 explicitly states that all base64 data in SMTP
AUTH challenges and responses can be of arbitrary
length, but the base64 module adds a newline after 57
bytes of binary data that it has converted to ascii.
This is not accounted for in smtplib.login(), leading
to extraneous newline characters in the middle of long
responses that do weird things to the SMTP session. The
patch is for smtplib.py already patched with the patch
from SourceForge patch ID 552060 and fixes this problem.

----------------------------------------------------------------------

>Comment By: Andrew Rucker Jones (arjones)
Date: 2002-07-17 20:40

Message:
Logged In: YES 
user_id=236100

Not that my opinion matters much, but i'm all for ghaering's
solution.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-30 23:20

Message:
Logged In: YES 
user_id=163326

I'd suggest to apply this simple patch for the 2.2
maintenance line and use email.base64MIME.encode in the CVS
version. This way we could move on without having to keep
this patch in the queue.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-30 17:58

Message:
Logged In: YES 
user_id=12800

We should probably be using the email.base64MIME package
instead of base64.  The former is more RFC compliant for
email and related functions.  

email.base64MIME.encode(s, eol='') will eliminate the
newlines.  However base64MIME is only present in Python 2.3
cvs.  It may be backported to Python 2.2.2, but I'm not sure
about that yet.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-21 12:42

Message:
Logged In: YES 
user_id=163326

Good spot!

I've incorporated this patch into my AUTH=LOGIN support with
patch #572031.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552605&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 18:11:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 10:11:46 -0700
Subject: [Patches] [ python-Patches-555085 ] timeout socket implementation
Message-ID: <E17VEoc-000071-00@usw-sf-web2.sourceforge.net>

Patches item #555085, was opened at 2002-05-12 08:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: Accepted
>Priority: 4
Submitted By: Michael Gilfix (mgilfix)
Assigned to: Guido van Rossum (gvanrossum)
Summary: timeout socket implementation

Initial Comment:
This implements bug #457114 and implements timed socket
operations. If a timeout is set and the timeout period
elaspes before the socket operation has finished, a
socket.error exception is thrown.

This patch integrates the functionality at two levels:
the timeout capability is integrated at the C level in
socketmodule.c. Socket.py was also modified to update 
fileobject creation on a win platform to handle the
case of the underlying socket throwing an exception.
The tex documentation was also updated and a new
regression unit was provided as test_timeout.py.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 13:11

Message:
Logged In: YES 
user_id=6380

The default timeout is now implemented in CVS.

There's a bug report from Andrew Macintyre (unfortunately on
python-dev) about test_socket.py failures on FreeBSD. I'll
try to keep an eye on that, so this patch *still* stays
open. Also, Bernie has promised some changes that I haven't
received yet and the details of which I don't recall (sorry
:-( ).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-07 21:47

Message:
Logged In: YES 
user_id=6380

Keeping this open as a reminder of things still to finish.

Most is in the python-dev discussion; Michael Gilfix and
Bernard Yue have offered to produce more patches.

One feature we definitely want is a way to specify a timeout
to be applied to all new sockets.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-06 17:11

Message:
Logged In: YES 
user_id=6380

Thanks for the new version! I've checked this in.  I made
considerable changes; the following is feedback but you
don't need to respond because I've addressed all these in
the checked-in code!

- Thanks for the cleanup of some non-standard formatting.
However, it's better not to do this so the diffs don't show
changes that are unrelated to the timeout patch.

- You are still importing the select module instead of
calling select() directly. I really think you should do the
latter -- the select module has an enormous overhead (it
allocates several large lists on the heap).

- Instead of explicitly testing the argument to settimeout
for being a float, int or long, you should simply call
PyFloat_AsDouble and handle the error; if someone passes
another object that implements __float__ that should be
acceptable.

- gettimeout() returns sock_timeout without checking if it
is NULL. It can be NULL when a socket object is never
initialized. E.g. I can do this:

>>> from socket import *
>>> s = socket.__new__(socket)
>>> s.gettimeout()

which gives me a segfault. There are probably other places
where this is assumed.

- I addressed the latter two issues by making sock_timeout a
double, whose value is < 0.0 when no timeout is set.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-06-05 18:23

Message:
Logged In: YES 
user_id=116038

I've addressed all the issues brought up by Guido. The 2nd
version of the patch is attached here. In this version, I've
modified test_socket.py to include tests for the _fileobject
class in socket.py that was modified by this patch.
_fileobject needed to be modified so that data would not be
lost when the underlying socket threw an expection (data was
no longer accumulated in local variables). The tests for the
_fileobject class succeed on older versions of python
(tested 2.1.3) and pass on the newer version of python.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-05-23 16:18

Message:
Logged In: YES 
user_id=6380

For a detailed review, see

http://mail.python.org/pipermail/python-dev/2002-May/024340.html

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 16:36:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 08:36:57 -0700
Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes
Message-ID: <E17VDKr-0003qC-00@usw-sf-web5.sourceforge.net>

Patches item #525109, was opened at 2002-03-03 06:58
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470

Category: IDLE
Group: Python 2.3
Status: Open
Resolution: None
>Priority: 3
Submitted By: Martin Liebmann (mliebmann)
>Assigned to: Nobody/Anonymous (nobody)
Summary: Extension to Calltips / Show attributes

Initial Comment:
The attached files (unified diff files) implement a 
(quick and dirty but usefull) extension to IDLE 0.8 
(Python 2.2)

- Tested on WINDOWS 95/98/NT/2000 -

Similar to "CallTips" this extension shows (context 
sensitive) all available member functions and 
attributes of the current object after hitting 
the 'dot'-key.

The toplevel help widget now supports scrolling. (Key-
Up and Key-Down events)

...that is why I changed among else the first argument 
of 'showtip' from 'text string' to a 'list of text 
strings' ...

The 'space'-key is used to insert the topmost item of 
the help widget into an IDLE text window.

...the even handling seems to be a critical part of 
the current IDLE implementation. That is why I added 
the new functionallity as a patch of CallTips.py and 
CallTipWindow.py. May be you still have a better 
implementation ...

Greetings
Martin Liebmann

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 11:36

Message:
Logged In: YES 
user_id=6380

I'm really sorry, I just don't have the time to hack on
IDLE. Perhaps you can resubmit this patch to the IDLEFORK
project (also at SF)?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 11:40

Message:
Logged In: YES 
user_id=6656

feature --> not in 2.2.x

----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-07 16:41

Message:
Logged In: YES 
user_id=475133

Patched and more robust version of the extended files 
CallTips.py and CallTipWindows.py. (Now more compatible to 
earlier versions of python)


----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-03 17:02

Message:
Logged In: YES 
user_id=475133

'<Key-.>' must be substituted by '.' within CallTip.py !
( Linux do not support an event named <Key-.> )

Running idle on Linux, I found the warning, that 'import *' 
is not allowed within function '_dir_main' of CallTip.py ???
Nevertheless CallTips works fine on Linux

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 16:29:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 08:29:38 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VDDm-0003k1-00@usw-sf-web5.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 11:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 17:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 18:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 17:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 17:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 12:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 16:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-27 00:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 21:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-21 00:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 17:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 15:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 16:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 20:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-05 02:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 04:29:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 20:29:45 -0700
Subject: [Patches] [ python-Patches-583188 ] expose Parser.strict flag to functions
Message-ID: <E17V1z7-0005PC-00@usw-sf-web5.sourceforge.net>

Patches item #583188, was opened at 2002-07-18 13:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: expose Parser.strict flag to functions

Initial Comment:
The following trivial patch exposes the 'strict' flag in
the email.message_from_file and email.message_from_string
functions.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 03:34:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 19:34:36 -0700
Subject: [Patches] [ python-Patches-583180 ] smtplib.py patch for macmail esmtp auth
Message-ID: <E17V17k-0000bl-00@usw-sf-web4.sourceforge.net>

Patches item #583180, was opened at 2002-07-18 02:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: bob kuehne (mysticbob)
Assigned to: Nobody/Anonymous (nobody)
Summary: smtplib.py patch for macmail esmtp auth

Initial Comment:
i ran into a problem that i've seen several other people
describe where they can't authenticate to their particular
mail server. i dug into this (my mail server is smtp.mac.com)
and discovered that smtplib.py didn't support the specific
type of auth that this server required.

so, this patch,allows authentication to these specific
server types. i also reworked one token to make it
a bit more modular. the patch is attached, generated
of form: diff smtplib.py_orig smtplib.py_new

i'm new to python, and new to the whole patch process
on sourceforge, so  please let me know what i can do
to test, or how else i can work to get this in the next
python version. thank you!
bob

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 04:37:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 20:37:30 -0700
Subject: [Patches] [ python-Patches-583190 ] Patch to make email parser more robust
Message-ID: <E17V26c-0005Ww-00@usw-sf-web5.sourceforge.net>

Patches item #583190, was opened at 2002-07-18 13:36
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470

>Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Patch to make email parser more robust

Initial Comment:
the following patch against current CVS of the email
package, as of 2002/07/18 fixes the following problems:

  in non-strict mode, messages don't require a blank
line at the end 
  with a missing end-terminator. A single newline is
sufficient now. 

The remaining fixes apply in strict or non-strict mode:

  Handle trailing whitespace at the end of a boundary.
Had to switch 
  from using string.split() to re.split().

  Handle whitespace on the end of a parameter list for
Content-type.

  Handle whitespace on the end of a plain content-type
header.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 07:50:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 23:50:34 -0700
Subject: [Patches] [ python-Patches-583235 ] make file object an iterator
Message-ID: <E17V57S-0003Jt-00@usw-sf-web2.sourceforge.net>

Patches item #583235, was opened at 2002-07-18 08:50
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583235&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Alex Martelli (aleax)
Assigned to: Nobody/Anonymous (nobody)
Summary: make file object an iterator

Initial Comment:
As per python-dev discussion july 17 2002 & earlier, I 
reworked Oren's patch to remove a reference loop 
between file object and xreadlines object (making the 
reference xreadl.->fileob non-addref'd when and only
when the xreadlines object is being internally held by
the fileob), make f.readline interop with f.next (the former
delegating to the latter iff f is holding an xreadl. obj), make
f.seek remove the xreadl.obj that f is holding (if any), and
removing the optimization of caching xreadlines function
pointers as static variables in functions of fileobject.c.

Also added tests for this functionality to test_file.py.


Alex


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583235&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 04:34:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 20:34:08 -0700
Subject: [Patches] [ python-Patches-583188 ] expose Parser.strict flag to functions
Message-ID: <E17V23M-0005US-00@usw-sf-web5.sourceforge.net>

Patches item #583188, was opened at 2002-07-18 13:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470

>Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: expose Parser.strict flag to functions

Initial Comment:
The following trivial patch exposes the 'strict' flag in
the email.message_from_file and email.message_from_string
functions.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 04:36:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 17 Jul 2002 20:36:43 -0700
Subject: [Patches] [ python-Patches-583190 ] Patch to make email parser more robust
Message-ID: <E17V25r-0005WB-00@usw-sf-web5.sourceforge.net>

Patches item #583190, was opened at 2002-07-18 13:36
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch to make email parser more robust

Initial Comment:
the following patch against current CVS of the email
package, as of 2002/07/18 fixes the following problems:

  in non-strict mode, messages don't require a blank
line at the end 
  with a missing end-terminator. A single newline is
sufficient now. 

The remaining fixes apply in strict or non-strict mode:

  Handle trailing whitespace at the end of a boundary.
Had to switch 
  from using string.split() to re.split().

  Handle whitespace on the end of a parameter list for
Content-type.

  Handle whitespace on the end of a plain content-type
header.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 20:41:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 12:41:37 -0700
Subject: [Patches] [ python-Patches-552438 ] PyBufferObject fixes
Message-ID: <E17VH9d-0001Dt-00@usw-sf-web3.sourceforge.net>

Patches item #552438, was opened at 2002-05-05 00:26
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470

Category: Core (C code)
Group: None
Status: Open
>Resolution: Out of Date
Priority: 5
Submitted By: Scott Gilbert (xscott)
>Assigned to: Tim Peters (tim_one)
Summary: PyBufferObject fixes

Initial Comment:
This patch fixes these problems:

  1) Dangling pointer problem
  2) buffer allocated by PyBuffer_New not aligned

The PyBufferObject acts differently depending on 
whether it allocated the memory or if it's borrowing 
the memory from a PyBufferProcs supporting object.

In the case of allocating it's own memory, I made a 
slight addition that adds some padding so that the ptr 
is on a sizeof(double) boundary.

In the case of borrowing another objects PyBufferProcs 
memory, PyBufferObject no longer caches the pointer.  
This might slow things down (probably not by much), 
but it keeps PyBufferObject from working with a stale 
pointer.


Normally I wouldn't do this, but since this patch 
touches pretty much every function anyway, I fixed 
many deviations from the Python coding style.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 15:41

Message:
Logged In: YES 
user_id=6380

Note, the patch is out of date since somebody fixed some
nits with slicing, so I'm marking this as Out Of Date.

You might as well upload the new version of the file. :-)

Why do you think you need to fix the allocation? Since
allocation is done via malloc(), and malloc() guarantees
allocation for a double ("for all types"), shouldn't that be
enough??? (If it's obmalloc that you're worried about, it's
easy to force this to use the real malloc() and free().)

I hope Tim will make some time to review this (the "not this
week" comment is several months old now). Superficially it
looks like a big improvement.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-05-07 14:51

Message:
Logged In: YES 
user_id=31435

Na, assigning a bug is fine by me -- it helps to have 
*someone* feel guilty <wink>.  Assigning it doesn't mean it 
goes to the top of the assignee's heap, though.  I can't 
make time to look at it this week, so it's just as well 
that it got unassigned.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-07 08:55

Message:
Logged In: YES 
user_id=38318

Apparently assigning a patch is poor form.  My bad.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-05 00:27

Message:
Logged In: YES 
user_id=38318

Can I assign this to you or does it take admin privs?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 21:27:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 13:27:45 -0700
Subject: [Patches] [ python-Patches-580995 ] new version of Set class
Message-ID: <E17VHsH-0004Ke-00@usw-sf-web2.sourceforge.net>

Patches item #580995, was opened at 2002-07-13 17:53
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580995&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Alex Martelli (aleax)
Assigned to: Nobody/Anonymous (nobody)
Summary: new version of Set class

Initial Comment:
As per python-dev discussion on Sat 13 July 2002, 
subject
"Dict constructor".  A version of Greg Wilson's sandbox
Set class that avoids the trickiness of implicitly freezing
a set when __hash__ is called on it.  Rather, uses 
several classes: Set itself has no __hash__ and 
represents a
general, mutable set; BaseSet, its superclass, has all
functionality common to mutable and immutable sets; 
ImmutableSet also subclasses BaseSet and adds 
__hash__; a wrapper _TemporarilyImmutableSet wraps
a Set exposing only __hash__ (identical to that an 
ImmutableSet built from the Set would have) and __eq__ 
and __ne__ (delegated to the Set instance).

Set.add(self, x) attempts to call x=x._asImmutable() (if
AttributeError leaves x alone); Set._asImmutable(self)
returns ImmutableSet(self).
Membership test BaseSet.__contains__(self, x) attempt
to call x = x._asTemporarilyImmutable() (if AttributeError 
leaves x alone); Set._asTemporarilyImmutable(self) 
returns TemporarilyImmutableSet(self).

I've left Greg's code mostly alone otherwise except for
fixing bugs/obsolescent usage (e.g. dictionary rather than
dict) and making what were ValueError into TypeError 
(ValueError was doubtful earlier, is untenable now that
mutable and immutable sets are different types).  The
change in exceptions forced me to change the unit tests
in test_set.py, too, but I made no other changes nor
additions.

----------------------------------------------------------------------

>Comment By: Alex Martelli (aleax)
Date: 2002-07-18 22:27

Message:
Logged In: YES 
user_id=60314

Changed as per GvR comments so now sets have-a dict rather 
than being-a dict.  Made code more direct in some places (using
list comprehensions rather than loops where appropriate).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580995&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 22:29:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 14:29:31 -0700
Subject: [Patches] [ python-Patches-583188 ] expose Parser.strict flag to functions
Message-ID: <E17VIq3-0005To-00@usw-sf-web4.sourceforge.net>

Patches item #583188, was opened at 2002-07-17 23:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: expose Parser.strict flag to functions

Initial Comment:
The following trivial patch exposes the 'strict' flag in
the email.message_from_file and email.message_from_string
functions.

----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-18 17:29

Message:
Logged In: YES 
user_id=12800

Accepted and applied.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583188&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 22:39:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 14:39:12 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VIzQ-0005o0-00@usw-sf-web2.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 14:39

Message:
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 08:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 14:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 15:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 14:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 14:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 09:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 22:47:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 14:47:18 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VJ7G-0002Up-00@usw-sf-web5.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 17:47

Message:
Logged In: YES 
user_id=6380

OK, deleting all old files as promised. All tests succeed. I
think I'll check this version in (but it may be tomorrow,
since I've got a few other things to take care of).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 17:39

Message:
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 11:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 17:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 18:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 17:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 17:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 12:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 16:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-27 00:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 21:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-21 00:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 17:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 15:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 16:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 20:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-05 02:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Thu Jul 18 23:34:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 15:34:16 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VJqi-0003IM-00@usw-sf-web5.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 15:34

Message:
Logged In: YES 
user_id=357491

Wonderful!

About the docs; do you want me to email Fred or upload a
patched version of the docs for time fixed?  And for
removing the request in PEP 42, should I email Jeremy about
it or Barry?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 14:47

Message:
Logged In: YES 
user_id=6380

OK, deleting all old files as promised. All tests succeed. I
think I'll check this version in (but it may be tomorrow,
since I've got a few other things to take care of).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 14:39

Message:
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 08:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 14:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 15:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 14:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 14:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 09:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 00:09:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 16:09:12 -0700
Subject: [Patches] [ python-Patches-583190 ] Patch to make email parser more robust
Message-ID: <E17VKOW-0004ex-00@usw-sf-web3.sourceforge.net>

Patches item #583190, was opened at 2002-07-17 23:36
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Patch to make email parser more robust

Initial Comment:
the following patch against current CVS of the email
package, as of 2002/07/18 fixes the following problems:

  in non-strict mode, messages don't require a blank
line at the end 
  with a missing end-terminator. A single newline is
sufficient now. 

The remaining fixes apply in strict or non-strict mode:

  Handle trailing whitespace at the end of a boundary.
Had to switch 
  from using string.split() to re.split().

  Handle whitespace on the end of a parameter list for
Content-type.

  Handle whitespace on the end of a plain content-type
header.


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-18 19:09

Message:
Logged In: YES 
user_id=12800

I made a few stylistic mods to the Parser.py patch, but
otherwise the patch looks fine.  Please double check that I
didn't mess anything up!


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583190&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 00:35:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 16:35:46 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VKoE-0004QN-00@usw-sf-web1.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 16:35

Message:
Logged In: YES 
user_id=357491

Since I had the time, I went ahead and did a patch for
libtime.tex that removes the comment saying that strptime
fully relies on the C library and uploaded it.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 15:34

Message:
Logged In: YES 
user_id=357491

Wonderful!

About the docs; do you want me to email Fred or upload a
patched version of the docs for time fixed?  And for
removing the request in PEP 42, should I email Jeremy about
it or Barry?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 14:47

Message:
Logged In: YES 
user_id=6380

OK, deleting all old files as promised. All tests succeed. I
think I'll check this version in (but it may be tomorrow,
since I've got a few other things to take care of).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 14:39

Message:
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 08:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 14:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 15:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 14:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 14:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 09:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 13:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-26 21:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 18:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-20 21:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 14:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 12:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 13:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 17:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 23:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 16:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 15:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 22:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-03 22:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-03 21:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 00:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 06:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 14:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 01:43:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 17:43:49 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VLs5-000092-00@usw-sf-web4.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-18 20:43

Message:
Logged In: YES 
user_id=33168

Brett, I'm still following.  It wasn't that bad. :-)
Guido, let me know if you want me to
do anything/check stuff in.

Docs are fine to upload here.  I can change PEP 42 also.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 19:35

Message:
Logged In: YES 
user_id=357491

Since I had the time, I went ahead and did a patch for
libtime.tex that removes the comment saying that strptime
fully relies on the C library and uploaded it.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 18:34

Message:
Logged In: YES 
user_id=357491

Wonderful!

About the docs; do you want me to email Fred or upload a
patched version of the docs for time fixed?  And for
removing the request in PEP 42, should I email Jeremy about
it or Barry?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 17:47

Message:
Logged In: YES 
user_id=6380

OK, deleting all old files as promised. All tests succeed. I
think I'll check this version in (but it may be tomorrow,
since I've got a few other things to take care of).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 17:39

Message:
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 11:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 17:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 18:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 17:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 17:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 12:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 16:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-27 00:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 21:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-21 00:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 17:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 15:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 16:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 20:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-05 02:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 02:03:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 18:03:56 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17VMBY-0005Ia-00@usw-sf-web5.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 01:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-18 21:03

Message:
Logged In: YES 
user_id=33168

Add patch for configure.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:33

Message:
Logged In: YES 
user_id=33168

Sorry, I forgot about this patch.
I just tested on Linux (RedHat 7.2).
No problems, all expected tests successful.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 20:41

Message:
Logged In: YES 
user_id=14198

My patch is after Martin's so hopefully I have the macros
correct (or at least haven't regressed anything of his!)

DL_*PORT still exists, but is deprecated.  Eventually every
header will change, but for now DL_*PORT still works as before.

And yes, finding autoconf-2.5.3 for my cygwin and linux
platforms is what took 1/2 the time of getting this patch
together :)

Another report of success on Linux would be great!  To date,
I have not heard of a single person trying this patch on any
platform.

Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-05 14:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 02:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 08:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-03 21:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 06:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-01 21:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 14:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-23 23:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 01:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 07:57:40 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 18 Jul 2002 23:57:40 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17VRhs-0002pr-00@usw-sf-web5.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 15:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2002-07-19 16:57

Message:
Logged In: YES 
user_id=14198

Thanks all!

Checking in configure;
/cvsroot/python/python/dist/src/configure,v  <--  configure
new revision: 1.322; previous revision: 1.321
Checking in pyconfig.h.in;
/cvsroot/python/python/dist/src/pyconfig.h.in,v  <-- 
pyconfig.h.in
new revision: 1.43; previous revision: 1.42
Checking in configure.in;
/cvsroot/python/python/dist/src/configure.in,v  <-- 
configure.in
new revision: 1.333; previous revision: 1.332
Checking in Makefile.pre.in;
/cvsroot/python/python/dist/src/Makefile.pre.in,v  <-- 
Makefile.pre.in
new revision: 1.88; previous revision: 1.87
Checking in Include/pyport.h;
/cvsroot/python/python/dist/src/Include/pyport.h,v  <-- 
pyport.h
new revision: 2.52; previous revision: 2.51
Checking in Include/import.h;
/cvsroot/python/python/dist/src/Include/import.h,v  <-- 
import.h
new revision: 2.28; previous revision: 2.27
Checking in PC/pyconfig.h;
/cvsroot/python/python/dist/src/PC/pyconfig.h,v  <--  pyconfig.h
new revision: 1.14; previous revision: 1.13
Checking in PC/_winreg.c;
/cvsroot/python/python/dist/src/PC/_winreg.c,v  <--  _winreg.c
new revision: 1.11; previous revision: 1.10
Checking in Modules/_sre.c;
/cvsroot/python/python/dist/src/Modules/_sre.c,v  <--  _sre.c
new revision: 2.82; previous revision: 2.81
Checking in Modules/pyexpat.c;
/cvsroot/python/python/dist/src/Modules/pyexpat.c,v  <-- 
pyexpat.c
new revision: 2.70; previous revision: 2.69
Checking in Python/thread.c;
/cvsroot/python/python/dist/src/Python/thread.c,v  <--  thread.c
new revision: 2.45; previous revision: 2.44
Checking in Doc/ext/extending.tex;
/cvsroot/python/python/dist/src/Doc/ext/extending.tex,v  <--
 extending.tex
new revision: 1.22; previous revision: 1.21


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-19 11:03

Message:
Logged In: YES 
user_id=33168

Add patch for configure.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-16 08:33

Message:
Logged In: YES 
user_id=33168

Sorry, I forgot about this patch.
I just tested on Linux (RedHat 7.2).
No problems, all expected tests successful.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-06 10:41

Message:
Logged In: YES 
user_id=14198

My patch is after Martin's so hopefully I have the macros
correct (or at least haven't regressed anything of his!)

DL_*PORT still exists, but is deprecated.  Eventually every
header will change, but for now DL_*PORT still works as before.

And yes, finding autoconf-2.5.3 for my cygwin and linux
platforms is what took 1/2 the time of getting this patch
together :)

Another report of success on Linux would be great!  To date,
I have not heard of a single person trying this patch on any
platform.

Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-06 04:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 16:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 22:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-04 11:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 20:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-02 11:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-02 04:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-24 13:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 15:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 08:23:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 19 Jul 2002 00:23:24 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17VS6m-0003Hd-00@usw-sf-web5.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 01:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-19 03:23

Message:
Logged In: YES 
user_id=31435

Au contraire, thank you!

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-19 02:57

Message:
Logged In: YES 
user_id=14198

Thanks all!

Checking in configure;
/cvsroot/python/python/dist/src/configure,v  <--  configure
new revision: 1.322; previous revision: 1.321
Checking in pyconfig.h.in;
/cvsroot/python/python/dist/src/pyconfig.h.in,v  <-- 
pyconfig.h.in
new revision: 1.43; previous revision: 1.42
Checking in configure.in;
/cvsroot/python/python/dist/src/configure.in,v  <-- 
configure.in
new revision: 1.333; previous revision: 1.332
Checking in Makefile.pre.in;
/cvsroot/python/python/dist/src/Makefile.pre.in,v  <-- 
Makefile.pre.in
new revision: 1.88; previous revision: 1.87
Checking in Include/pyport.h;
/cvsroot/python/python/dist/src/Include/pyport.h,v  <-- 
pyport.h
new revision: 2.52; previous revision: 2.51
Checking in Include/import.h;
/cvsroot/python/python/dist/src/Include/import.h,v  <-- 
import.h
new revision: 2.28; previous revision: 2.27
Checking in PC/pyconfig.h;
/cvsroot/python/python/dist/src/PC/pyconfig.h,v  <--  pyconfig.h
new revision: 1.14; previous revision: 1.13
Checking in PC/_winreg.c;
/cvsroot/python/python/dist/src/PC/_winreg.c,v  <--  _winreg.c
new revision: 1.11; previous revision: 1.10
Checking in Modules/_sre.c;
/cvsroot/python/python/dist/src/Modules/_sre.c,v  <--  _sre.c
new revision: 2.82; previous revision: 2.81
Checking in Modules/pyexpat.c;
/cvsroot/python/python/dist/src/Modules/pyexpat.c,v  <-- 
pyexpat.c
new revision: 2.70; previous revision: 2.69
Checking in Python/thread.c;
/cvsroot/python/python/dist/src/Python/thread.c,v  <--  thread.c
new revision: 2.45; previous revision: 2.44
Checking in Doc/ext/extending.tex;
/cvsroot/python/python/dist/src/Doc/ext/extending.tex,v  <--
 extending.tex
new revision: 1.22; previous revision: 1.21


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-18 21:03

Message:
Logged In: YES 
user_id=33168

Add patch for configure.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 18:33

Message:
Logged In: YES 
user_id=33168

Sorry, I forgot about this patch.
I just tested on Linux (RedHat 7.2).
No problems, all expected tests successful.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 20:41

Message:
Logged In: YES 
user_id=14198

My patch is after Martin's so hopefully I have the macros
correct (or at least haven't regressed anything of his!)

DL_*PORT still exists, but is deprecated.  Eventually every
header will change, but for now DL_*PORT still works as before.

And yes, finding autoconf-2.5.3 for my cygwin and linux
platforms is what took 1/2 the time of getting this patch
together :)

Another report of success on Linux would be great!  To date,
I have not heard of a single person trying this patch on any
platform.

Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-05 14:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 02:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 08:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-03 21:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 06:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-01 21:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 14:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-23 23:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 01:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Fri Jul 19 18:09:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 19 Jul 2002 10:09:54 -0700
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E17VbGM-0002c0-00@usw-sf-web4.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-19 13:09

Message:
Logged In: YES 
user_id=6380

Thanks!  All checked in.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-18 20:43

Message:
Logged In: YES 
user_id=33168

Brett, I'm still following.  It wasn't that bad. :-)
Guido, let me know if you want me to
do anything/check stuff in.

Docs are fine to upload here.  I can change PEP 42 also.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 19:35

Message:
Logged In: YES 
user_id=357491

Since I had the time, I went ahead and did a patch for
libtime.tex that removes the comment saying that strptime
fully relies on the C library and uploaded it.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 18:34

Message:
Logged In: YES 
user_id=357491

Wonderful!

About the docs; do you want me to email Fred or upload a
patched version of the docs for time fixed?  And for
removing the request in PEP 42, should I email Jeremy about
it or Barry?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 17:47

Message:
Logged In: YES 
user_id=6380

OK, deleting all old files as promised. All tests succeed. I
think I'll check this version in (but it may be tomorrow,
since I've got a few other things to take care of).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-18 17:39

Message:
Logged In: YES 
user_id=357491

God I wish I could delete those old files!  Poor Neal
Norwitz was nice enough to go over my code once to help me
make it sure it was up for being included in the stdlib, but
he initially used an old version.  Thankfully he was nice
enough to look over the newer version at the time.  But no,
SF does not give me the priveleges to delete old files (and
why is that?  I am the creator of the patch; you would think
I could manage my own files).  I re-uploaded everything now.
 All files that specify they were uploaded 2002-07-17 are
the newest files.

I am terribly sorry about this whole name mix-up.  I have
now fixed test_strptime.py to use _strptime.  I completely
removed the strptime import so that the strptime testing
will go through time and thus test which ever version time
will export.

I removed the __future__ import.  And thanks for the piece
of advice; I was taking the advice that __future__
statements should come before code a little too far.  =)

As for your error, that is because the test_strptime.py you
are using is old.  I originally had a test in there that
checked to make sure the regex returned was the same as the
one being tested for; that was a bad decision.  So I went
through and removed all hard-coded tests like that. 
Unfortunately the version you ran still had that test in
there.  SF should really let patch creators delete old files.

That's it this time.  Now I await the next drama in this
never-ending saga of trying to make a non-trivial
contribution to Python.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 11:29

Message:
Logged In: YES 
user_id=6380

- Can you please delete all the obsolete uploads? (If SF
won't let you, let me know and I'll do it for you, leaving
only the most recend version of each.)

- There' still a confusion between strptime.py and
_strptime.py; your test_time.py imports strptime, and so
does the latest version of test_strptime.py I can find.

- The "from __future__ import division" is unnecessary,
since you're never using the single / operator (// doesn't
need the future statement). Also note that future statements
should come *after* a module's docstring (for future
reference :-).

- When I run test_strptime.py, I get one failure:

======================================================================
FAIL: Test TimeRE.pattern.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../Lib/test/test_strptime.py", line 124, in test_pattern
   
self.failUnless(pattern_string.find("(?P<d>(3[0-1])|([0-2]\d)|\d|(
\d))") != -1, "did not find 'd' directive pattern string
'%s'" % pattern_string)
  File "/home/guido/python/dist/src/Lib/unittest.py", line
262, in failUnless
    if not expr: raise self.failureException, msg
AssertionError: did not find 'd' directive pattern string
'(?P<a>(?:Mon)|(?:Tue)|(?:Wed)|(?:Thu)|(?:Fri)|(?:Sat)|(?:Sun))\s*(?P<A>(?:Wednesday)|(?:Thursday)|(?:Saturday)|(?:Tuesday)|(?:Monday)|(?:Friday)|(?:Sunday))\s*(?P<d>3[0-1]|[0-2]\d|\d|
\d)'

----------------------------------------------------------------------

I haven't looked into this deeper.

Back to you...

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-16 17:34

Message:
Logged In: YES 
user_id=357491

Two things have been uploaded.  First, test_time.py w/ a
strptime test.  It is almost an exact mirror of the strftime
test; only difference is that I used strftime to test
strptime.  So if strftime ever fails, strptime will fail
also.  I feel this is fine since strptime depends on
strftime so much that if strftime were to fail strptime
would definitely fail.

The other file is version 2.1.5 of strptime.  I made two
changes.  One was to remove the TypeError raised when %I was
used without %p.  This was from me being very picky about
only accepting good data strings.  The second was to go
through and replace all whitespace in the format string with
\s*.  That basically makes this version of strptime XPG
compatible as far as I (and the NetBSD man page) can tell. 
The only difference now is that I do not require whitespace
or a non-alphanumeric character between format strings. 
Seems like a ridiculous requirement since the requirement
that whitespace be able to compress down to no whitespace
negates this requirement.  Oh well, we are more than
compliant now.

I decided not to write a patch for the docs to make them
read more leniently for what the format directives.  Figured
I would just let people who think like me do it in a more
"proper" way with leading zeros and those who don't read it
like that to still be okay.

I think that is everything.  If you want more in-depth
tests, Guido, I can add them to the testing suite, but I
figured that since this is (hopefully) going in bug-free it
needs only be checked to make sure it isn't broken by
anything.  And if you do want more in-depth tests, do you
want me to add mirror tests for strftime or not worry about
that since that is the ANSI C library's problem?  Other then
that, I think strptime is pretty much done.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 18:27

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.4.  I added \d to the end of all relevant
regexes (basically all of them but %y and %Y) to deal with
non-zero-leading numbers.

I also made the regex case-insensitive.

As for the diff failing, I am wondering if I am doing
something wrong.  I am just running diff -c CVS_file
modified_file > diff_file .  Isn't that right?

I will work on merging my strptime tests into the time
regression tests and upload a patch here.

I will do a patch for the docs since it is not consistent
with the explanation of struct_time (or at least in my opinion).

I tried finding XPG docs, but the best Google came up with
was the NetBSD man pages for strptime (which they claim is
XPG compliant).  The difference between that implementation
and mine is that NetBSD's allows whitespace (defined as
isspace()) in the format string to match \s* in the data
string.  It also requires a whitespace or a non-alphanumeric
character while my implementation does not require that.

Personally, I don't like either difference.  If they were
used, though, there might be a possibility of rewriting
strptime to just use a bunch of string methods instead of
regexes for a possible performance benefit.  But I prefer
regexes since it adds checks of the input.  That and I just
like regexes period.  =)

Also, I noticed that your little test returned 0 for all
unknown values.  Mine returns -1 since 0 can be a legitimate
value for some and I figured that would eliminate ambiguity.
 I can change it to 0, though.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 17:13

Message:
Logged In: YES 
user_id=6380

Hm, the new diff_time *still* fails to apply. But don't
worry about that.

I'd love to see regression tests for time.strptime. Please
upload them here -- don't start a new patch.

I think your interpretation of the docs is overly
restrictive; the table shows what strftime does but I think
it's reasonable for strptime to accept missing leading
zeros. You can upload a patch for the docs too if you feel
that's necessary. You may also try to read up on what the
XPG standard says about strptime.


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-12 17:02

Message:
Logged In: YES 
user_id=357491

To respond to your points, Guido:

(a) I accidentally uploaded the old file.  Sorry about that.
 I misnamed the new one 'time_diff" but in my head I meant
to overwrite "diff_time".  I have uploaded the new one.

(b) See (a)

(c)  Oops.  That is a complete oversight on my part.  Now in
(d) you mention writing up regression tests for the standard
time.strptime.  I am quite hapy to do this.  Do you want
that as a separate patch?  If so I will just stop with
uploading tests here and just start a patch with my strptime
tests for the stdlib tests.

(d) The reason this test failed is because your input is not
compliant with the Python docs.  Read what %m accepts:

Month as a decimal number [01,12]

Notice the leading 0 for the single digit month.  My
implementation follows the docs and not what glibc suggests.
 If you want, I can obviously add on to all the regexes \d
as an option and eliminate this issue.  But that means it
will no longer be following the docs.  This tripped Skip up
too since no one writes numbers that way; strftime does, though.
Now if the docs meant for no trailing 0, I think they should
be rewritten since that is misleading.

In other words, either strptime stays as it is and follows
the docs or I change the regexes, but then the docs will
have to be changed.  I can go either way, but I personally
would want to follow the docs as-is since strptime is meant
to parse strftime output and not human output.  =)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-12 12:58

Message:
Logged In: YES 
user_id=6380

Hm.  This isn't done yet. I get these problems:

(a) the patch for timemodule.c doesn't apply cleanly in
current CVS (trivial)

(b) it still tries to import strptime (no leading '_') (also
trivial)

(c) so does test_strptime.py (also trivial)

(d) the simplest of simple examples fails:

With Linux's strptime:

>>> time.strptime("7/12/02", "%m/%d/%y")
(2002, 7, 12, 0, 0, 0, 4, 193, 0)
>>>

With yours:

>>> time.strptime("7/12/02", "%m/%d/%y")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/guido/python/dist/src/Lib/_strptime.py", line
392, in strptime
    raise ValueError("time data did not match format")
ValueError: time data did not match format
>>> 

Perhaps you should write a regression test suite for the
strptime function as found in the time module courtesy of
libc, and then make sure that your code satisfies it?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-07-10 16:51

Message:
Logged In: YES 
user_id=357491

The actual 2.1.3 edition of strptime is now up.  I don't
think there are any changes, but since I renamed the file
_strptime.py, I figured uploading it again wouldn't hurt.

I also uploaded a new contextual diff of the time module
taken from CVS on 2002-07-10.  The only difference between
this and the previous diff (which was against 2.2.1's time
module) is the change of the imported module to _strptime.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-27 00:54

Message:
Logged In: YES 
user_id=357491

Uploaded 2.1.2 (but accidentally labelled it 2.1.3 down
below!).  Just a little bit more cleanup.  Biggest change is
that I changed the default format string and made strptime()
raise ValueError instead of TypeError.  This was all done to
match the time module docs.

I also fiddled with the regexes so that the groups were
none-capturing.  Mainly done for a possible performance
improvement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-23 21:06

Message:
Logged In: YES 
user_id=357491

2.1.1 is now uploaded.  Almost a purely syntatical change. 
>From discussions on python-dev I renamed the helper fxns so
they are all lowercase-style.  Also changed them so that
they state what the fxn returns.

I also put all of the imports on their own line as per PEP 8.

The only semantical change I did was directly import
re.compile since it is the only thing I am using from the re
module.

These changes required tweaking of my exhaustive testing
suite, so that got uploaded, too.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-21 00:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a contextual diff of timemodule.c with a
callout to strptime.strptime when HAVE_STRPTIME is not
defined just as Guido requested.

It's my first extension module, so I am not totally sure of
myself with it.  But since Alex Marttelli told me what I
needed to do I am fairly certain it is correct.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-19 17:49

Message:
Logged In: YES 
user_id=357491

2.1.0 is now up and ready for use.  I only changed two
things to the code, but since they change the semantics of
stprtime()s use, I made this a new minor release.

One, I removed the ability to pass in your own LocaleTime
object.  I did this for two reasons.  One is because I
forgot about how default arguments are created at the time
of function creation and not at each fxn call.  This meant
that if someone was not thinking and ran strptime() under
one locale and then switched to another locale without
explicitly passing in a new LocaleTime object for every call
for the new locale, they would get bad matches.  That is not
good.

The other reason was that I don't want to force users to
pass in a LocaleTime object on every call if I can't have a
default value for it.  This is meant to act as a drop-in
replacement for time.strptime().  That forced the removal of
the parameter since it can't have a default value.

In retrospect, though, people will probably never parse log
files in other languages other then there default locale. 
And if they were, they should change the locale for the
interpreter and not just for strptime().

The second change was what triggers strptime() to return an
re object that it can use.  Initially it was any nothing
value (i.e., would be considered false), but I realized that
an empty string could trigger that and it would be better to
raise a TypeError then let some error come up from trying to
use the re object in an incorrect way.

Now, to have an re object returned, you pass in False.  I
figured that there is a very minimal chance of passing in
False when you meant to pass in a string.  Also, False as
the data_string, to me, means that I don't want what would
normally be returned.

I debated about removing this feature from strptime(), but I
profiled it and most of the time comes from TimeRE's
__getitem__.  So building the string to be compiled into a
regex is the big bottleneck.  Using a precompiled regex
instead of constructing a new one everytime took 25% of the
time overall for strptime() when calling strptime() 10,000
times in a row.  This is a conservative number, IMO, for
calls in a row; I checked the Apache hit logs for a single
day on Open Computing Facility's web server
(http://www.ocf.berkeley.edu/) and there were 188,562 hits
on June 16 alone.  So I am going to keep the feature until
someone tells me otherwise.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-18 15:05

Message:
Logged In: YES 
user_id=357491

I have uploaded v. 2.0.4.  It now uses the calendar module
to figure out the names of weekdays and months.  Thanks goes
out to Guido for pointing out this undocumented feature of
calendar.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-17 16:11

Message:
Logged In: YES 
user_id=357491

I uploaded v.2.0.3.  Beyond implementing what I mentioned
previously (raising TypeError when a match fails, adding \d
to all applicable regexes) I did a few more things.

For one, I added a special " \d" to the numeric month regex.
 I discovered that ANSI C for ctime displays the month with
a leading space if it is a single digit.  So to deal with
that since at least Skip's C library likes to use that
format for %c, I went ahead and added it.

I changed all attributes in LocaleTime to lists.  A recent
mail on python-dev from GvR said that lists are for
homogeneous data, which everything that is grouped together
in LocaleTime is.  It also simplified the code slightly and
led to less conversions of data types.

I also added a method that raises a TypeError if you try to
assign to any of LocaleTime's attributes.  I thought that if
you left out the set value for property() it wouldn't work;
didn't realize it just defaults over to __setitem__.  So I
added that method as the set value for all of the property()s.

It does require 2.2.1 now since I used True and False
without defining them.  Obviously just set those values to 1
and 0 respectively if you are running under 2.2

I also updated the overly exhaustive PyUnit suite that I
have for testing my code.   It is not black-box testing,
though; Skip's pruned version of my testing suite fits that
bill (I think).

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-12 20:46

Message:
Logged In: YES 
user_id=357491

I am back from my vacation and ready to email python-
dev about getting this patch accepted (whether to modify 
time or make this a separate module, etc.).  I think I will 
do the email on June 17.

Before then, though, I am going to make two changes.  
One is the raise a Value Error exception if the regex doesn't 
match (to try to match time.strptime()s exception as seen 
in Skip's run of the unit test).  The other change is to tack 
on a \d on all numeric formats where it might come out as 
a single digit (i.e., lacking a leading zero).  This will be for 
v2.0.3 which I will post before June 17.

If there is any reason anyone thinks I should hold back on 
this, please let me know!  I would like to have this code as 
done as possible before I make any announcement.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-05 02:32

Message:
Logged In: YES 
user_id=357491

I went ahead an implemented most of Neal's suggestions.  On
a few, of them, though, I either didn't do it or took a
slightly different route.

For the 'yY' vs. ('y', 'Y'), I went with 'yY'.  If it gives
a performance boost, why not since it doesn't make the code
harder to read.  Implementing it actually had me catch some
redundant code for dealing with a literal %.

The tests in the __init__ for LocaleTime have been reworked
to check that they are either None or have the proper
length, otherwise they raise a TypeError.

I have gone through and tried to catch all the lines that
were over 80 characters and cut them up to fit.

For the adding of '' to tuples, I created a method that
could specify front or back concatination.  Not much
different from before, but it allows me to specify front or
back concatination easily.

I explained why the various magic dates were used.

I in no way have to worry about leap year.  Since it is not
validating the data string for validity the fxn just takes
the data and uses it.  I have no reason to calc for leap year.

date_time[offset] has been replaced with current_format and
added the requisite two lines to assign between it and the list.

You are only supposed to use __new__ when it is immutable. 
Since dict is obviously mutable, I don't need to worry about it.

Used Neal's suggested shortening of the sorter helper fxn.

I also used the suggestion of doing x = y = z = -1.  Now it
barely fits on a single line instead of two.

All numerical compares use == and != instead of is and is
not.  Didn't know about that dependency on
NSMALL((POS)|(NEG))INTS; good thing to know.

The doc string was backwards.  Thanks for catching that, Neal.

I also went through and added True and False where
appropriate.  There is a line in the code where True = 1;
False = 0 right at the top.  That can obviously be removed
if being run under Python 2.3.

And I completely understand being picky about minute details
where maintainability is a concern.  I just graduated from
Cal and so the memory of seeing beginning programmers' code
is still fresh in my mind <shudders>.

And I will query python-dev about how to go about to get
this added after the bugs are fixed and I am back home
(going to be out of town until June 16).  I will still be
periodically checking email, though, so I will continue to
implement any suggestions/bugfixes that anyone suggests/finds.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Sat Jul 20 09:14:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 20 Jul 2002 01:14:37 -0700
Subject: [Patches] [ python-Patches-581396 ] Canvas "select_item" always returns None
Message-ID: <E17VpNt-0003xl-00@usw-sf-web5.sourceforge.net>

Patches item #581396, was opened at 2002-07-14 19:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470

Category: Tkinter
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Nobody/Anonymous (nobody)
>Summary: Canvas "select_item" always returns None

Initial Comment:
bug in 2.1.3, 2.2.1 and CVS HEAD. One liner patch:

*** /usr/lib/python2.1/lib-tk/Tkinter.py.orig   Wed Jul
 3 17:04:28 2002 
--- /usr/lib/python2.1/lib-tk/Tkinter.py        Wed Jul
 3 17:04:31 2002 
*************** 
*** 2096,2100 **** 
      def select_item(self): 
          """Return the item which has the selection.""" 
!         self.tk.call(self._w, 'select', 'item') 
      def select_to(self, tagOrId, index): 
          """Set the variable end of a selection in
item TAGORID to INDEX.""" 
--- 2096,2100 ---- 
      def select_item(self): 
          """Return the item which has the selection.""" 
!         return self.tk.call(self._w, 'select', 'item') 
      def select_to(self, tagOrId, index): 
          """Set the variable end of a selection in
item TAGORID to INDEX.""" 
 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470


From noreply@sourceforge.net  Sat Jul 20 17:49:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 20 Jul 2002 09:49:31 -0700
Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix)
Message-ID: <E17VxQB-0003km-00@usw-sf-web5.sourceforge.net>

Patches item #584245, was opened at 2002-07-20 12:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: get python to link on OSF1 (Dec Unix)

Initial Comment:
Attached is a patch to fix the linking of python
(makedev not found) on Dec OSF/1 Unix 5.1.  This patch
has also been tested on Linux (RedHat 7.2).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470


From noreply@sourceforge.net  Sat Jul 20 19:35:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 20 Jul 2002 11:35:38 -0700
Subject: [Patches] [ python-Patches-568348 ] Add param to email.Utils.decode()
Message-ID: <E17Vz4s-0005fb-00@usw-sf-web5.sourceforge.net>

Patches item #568348, was opened at 2002-06-12 23:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: atsuo ishimoto (ishimoto)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Add param to email.Utils.decode() 

Initial Comment:
While email.Utils.decode() is a quite useful 
function, I got a real
world problem. 

Here in Japan, I receive a lot of RFC-hostile 
messages everyday. Since
they contains illegal characters cannot be 
converted to Unicode by
JapaneseCodecs, email.Utils.decode() chokes with 
UnicodeError. My
solution is an adding optional 'errors' parameter 
which is passed to unicode()
function. This allows me to replace illegal 
characters, instead of
abandoning entire text.


----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-21 06:45

Message:
Logged In: YES 
user_id=163326

I'd recommend to assign this patch to Barry Warsaw
(bwarsaw), who is the maintainer of the email module.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470


From noreply@sourceforge.net  Sun Jul 21 14:16:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 21 Jul 2002 06:16:17 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17WGZN-0001Q3-00@usw-sf-web2.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-21 23:16

Message:
Logged In: YES 
user_id=250749

Ok, I've prepared patches to convert the following files to
use PyMalloc for memory allocation:
Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff)
Python/compile.c (pymalloc-compile.diff)

I didn't bother with the other files in Parser/ as my malloc
logging shows that they only ever appear to make requests >
256 bytes.

I have attached/will attach a summary from my malloc logging
experiments for information.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 04:14

Message:
Logged In: YES 
user_id=31435

Thanks for the detailed followup, Andrew!  I incorporated 
some of this info into XXXROUNDUP's comments.

Without either patch, the system malloc has to do two 
miserable things:  (1) find bigger and bigger memory areas 
very frequently; and, (2) interleaved with that, allocate 
gazillions of tiny blocks too.  #2 makes it difficult for the 
platform malloc to find free space contiguous to the blocks 
allocated for #1, unless it arranges to move them to "the 
end" of memory, or into their own memory segments.  As a 
result it's likely to do a copy on nearly every large-block 
realloc, and the code used to do a realloc on every 3rd new 
child.

The XXXROUNDUP patch addressed #1 by asking to grow 
blocks much less frequently; PyMalloc addresses #2 by 
getting the tiny blocks out of the platform malloc's hair.  If 
the platform malloc is saved from either one, it's job 
becomes much easier.

It would still be nice to switch the parser to using 
pymalloc.  There are still disasters lurking, because some 
platform malloc packages appear to take quadratic time 
when *free*ing gazillions of tiny blocks (they thrash trying 
to coalesce them into larger contiguous free blocks).  
pymalloc doesn't try to coalesce free blocks, so is reliably 
immune to this disease.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 21:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 20:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 16:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 07:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 16:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Sun Jul 21 21:29:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 21 Jul 2002 13:29:43 -0700
Subject: [Patches] [ python-Patches-584626 ] yield allowed in try/finally
Message-ID: <E17WNKp-0003Nc-00@usw-sf-web4.sourceforge.net>

Patches item #584626, was opened at 2002-07-21 20:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584626&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: yield allowed in try/finally

Initial Comment:
A generator's dealloc function now resumes a generator
one last time by jumping directly to the return statement at 
the end of the code.  As a result, the finally section of any 
try/finally blocks is executed.  Any exceptions raised are 
treated just like exceptions in a __del__ finalizer.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584626&group_id=5470


From noreply@sourceforge.net  Mon Jul 22 20:53:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 22 Jul 2002 12:53:16 -0700
Subject: [Patches] [ python-Patches-585101 ] Fix relative imports in regression tests
Message-ID: <E17WjF6-0007NW-00@usw-sf-web5.sourceforge.net>

Patches item #585101, was opened at 2002-07-22 15:53
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470

Category: Tests
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Barry A. Warsaw (bwarsaw)
Assigned to: Jack Jansen (jackjansen)
Summary: Fix relative imports in regression tests

Initial Comment:
The regression test suite uses intrapackage relative
imports to import stuff like test_support, etc. 
There's no deep reason for this to be so, since "test"
is a standard package.  As long as all tests do
something like "from test import test_support" or
"import test.test_support" everything works fine. 
Keeping the relative imports makes life more difficult
for tests that don't live in the expected location of
Lib/test.

This patch fixes this by making sure all test imports
are absolute.  This works fine on *nix, but rumor has
it that the Mac tests are run differently so I'd like
Jack to comment on whether this patch breaks his test
suite or not.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470


From noreply@sourceforge.net  Mon Jul 22 20:55:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 22 Jul 2002 12:55:16 -0700
Subject: [Patches] [ python-Patches-568348 ] Add param to email.Utils.decode()
Message-ID: <E17WjH2-0007Q9-00@usw-sf-web5.sourceforge.net>

Patches item #568348, was opened at 2002-06-12 23:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: atsuo ishimoto (ishimoto)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Add param to email.Utils.decode() 

Initial Comment:
While email.Utils.decode() is a quite useful 
function, I got a real
world problem. 

Here in Japan, I receive a lot of RFC-hostile 
messages everyday. Since
they contains illegal characters cannot be 
converted to Unicode by
JapaneseCodecs, email.Utils.decode() chokes with 
UnicodeError. My
solution is an adding optional 'errors' parameter 
which is passed to unicode()
function. This allows me to replace illegal 
characters, instead of
abandoning entire text.


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-22 15:55

Message:
Logged In: YES 
user_id=12800

email.Utils.decode() is deprecated in favor of
email.Header.decode_header().  Is this patch still worth it?
 I think email.Utils.decode() ought to go away.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-21 06:45

Message:
Logged In: YES 
user_id=163326

I'd recommend to assign this patch to Barry Warsaw
(bwarsaw), who is the maintainer of the email module.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=568348&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 03:56:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 22 Jul 2002 19:56:14 -0700
Subject: [Patches] [ python-Patches-581396 ] Canvas "select_item" always returns None
Message-ID: <E17WpqQ-00071y-00@usw-sf-web1.sourceforge.net>

Patches item #581396, was opened at 2002-07-14 15:23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470

Category: Tkinter
Group: Python 2.3
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Matthias Klose (doko)
>Assigned to: Neal Norwitz (nnorwitz)
>Summary: Canvas "select_item" always returns None

Initial Comment:
bug in 2.1.3, 2.2.1 and CVS HEAD. One liner patch:

*** /usr/lib/python2.1/lib-tk/Tkinter.py.orig   Wed Jul
 3 17:04:28 2002 
--- /usr/lib/python2.1/lib-tk/Tkinter.py        Wed Jul
 3 17:04:31 2002 
*************** 
*** 2096,2100 **** 
      def select_item(self): 
          """Return the item which has the selection.""" 
!         self.tk.call(self._w, 'select', 'item') 
      def select_to(self, tagOrId, index): 
          """Set the variable end of a selection in
item TAGORID to INDEX.""" 
--- 2096,2100 ---- 
      def select_item(self): 
          """Return the item which has the selection.""" 
!         return self.tk.call(self._w, 'select', 'item') 
      def select_to(self, tagOrId, index): 
          """Set the variable end of a selection in
item TAGORID to INDEX.""" 
 

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-22 22:56

Message:
Logged In: YES 
user_id=33168

Made sure to return None if no item was selected.
Checked in as Tkinter.py 1.160.10.1 & 1.163

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581396&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 04:22:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 22 Jul 2002 20:22:34 -0700
Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0
Message-ID: <E17WqFu-0005Zu-00@usw-sf-web3.sourceforge.net>

Patches item #535335, was opened at 2002-03-26 13:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeffrey Honig (jchonig)
Assigned to: Nobody/Anonymous (nobody)
Summary: 2.2 patches for BSD/OS 5.0

Initial Comment:
The following patches were necessary to get Python 2.2
to work on BSD/OS 5.0.  More may follow as we are still
attempting to resolve some issues related to the
regression
tests (although these may be OS issues).

Thanks.

Jeff

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-22 23:22

Message:
Logged In: YES 
user_id=33168

Jeff, any chances of getting updates for this patch?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-06 04:49

Message:
Logged In: YES 
user_id=21627

Is an update of this patch forthcoming?

----------------------------------------------------------------------

Comment By: Jeffrey Honig (jchonig)
Date: 2002-03-26 14:08

Message:
Logged In: YES 
user_id=96862

Re: configure.in vs configure: we don't use autoconf here so
modifying
configure.in doesn't help us.  I should have copies the
changes and 
submitted them, but then they aren't too hard to figure
out....

Re: contrib{lib/include}: We install many of the packages
that we install
from the net (which we call contrib packages) into the
/usr/contrib heirarchy.  They won't be found by setup.py
unless those paths are
present.

Re: regrtest.py: Apologies about the regrtest.py content,
there are some
tests in there that shouldn't be, ignore it for now, I'll
submit an update
later.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-26 13:53

Message:
Logged In: YES 
user_id=33168

Lib/posixfile.py & Lib/test/test_fcntl.py seem harmless.
configure is generated, so configure.in will need the
changes made to it.

There seem to be many tests which fail, but perhaps
shouldn't:  fork1, locale, minidom, poll, pyexpat, sax,
unicode_file?

I'm also unsure of the benefit of adding
contrib/{lib/include} to setup.py.  This could be fine, 
but I don't know anything about distutils.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 04:34:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 22 Jul 2002 20:34:06 -0700
Subject: [Patches] [ python-Patches-506436 ] GETCONST/GETNAME/GETNAMEV speedup
Message-ID: <E17WqR4-0005lC-00@usw-sf-web3.sourceforge.net>

Patches item #506436, was opened at 2002-01-21 08:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Tim Peters (tim_one)
Summary: GETCONST/GETNAME/GETNAMEV speedup

Initial Comment:
The attached patch redefines the GETCONST, GETNAME &
GETNAMEV 
macros to do the following:

  * access the code object's consts 
and names through
    local variables instead of the long chain from 
f

  * use access macros to index the tuples and get
    the C string 
names

The code appears correct, and I've had no trouble
with 
it.  It only provides the most trivial of
improvement on pystone 
(around 1% when I see
anything), but it's all those little things 
that
add up, right?

Skip


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-22 23:34

Message:
Logged In: YES 
user_id=33168

Skip, I modified this code some, but your technique is still
valid.  I got rid of one of the indirections already.  The
patch can easily be updated.  Seems like the patch shouldn't
hurt.  Tim?

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 19:45

Message:
Logged In: YES 
user_id=44345

Looking for a vote up or down on this one...


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-01-21 08:47

Message:
Logged In: YES 
user_id=44345

Whoops...  Make the "observed" speedup 0.1%...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 09:03:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 01:03:57 -0700
Subject: [Patches] [ python-Patches-552438 ] PyBufferObject fixes
Message-ID: <E17WueD-0004fS-00@usw-sf-web4.sourceforge.net>

Patches item #552438, was opened at 2002-05-05 04:26
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Out of Date
Priority: 5
Submitted By: Scott Gilbert (xscott)
Assigned to: Tim Peters (tim_one)
Summary: PyBufferObject fixes

Initial Comment:
This patch fixes these problems:

  1) Dangling pointer problem
  2) buffer allocated by PyBuffer_New not aligned

The PyBufferObject acts differently depending on 
whether it allocated the memory or if it's borrowing 
the memory from a PyBufferProcs supporting object.

In the case of allocating it's own memory, I made a 
slight addition that adds some padding so that the ptr 
is on a sizeof(double) boundary.

In the case of borrowing another objects PyBufferProcs 
memory, PyBufferObject no longer caches the pointer.  
This might slow things down (probably not by much), 
but it keeps PyBufferObject from working with a stale 
pointer.


Normally I wouldn't do this, but since this patch 
touches pretty much every function anyway, I fixed 
many deviations from the Python coding style.


----------------------------------------------------------------------

>Comment By: Scott Gilbert (xscott)
Date: 2002-07-23 08:03

Message:
Logged In: YES 
user_id=38318

On top of the current patch being out of data, in private email, 
Guido indicated that Tim thinks the code needs more 
refactoring to simplify it.

I'd like to hold off on resubmitting a current patch to see how 
the bytes object fairs (PEP 296).  If the bytes object makes it 
into the Python core, then probably the best way to simplify 
and fix the implementation of the buffer object is to reduce it 
nothing but a "Buffer Inspector" for other objects.  (Tearing out 
the b_ptr field and a lot of if statements at least.)  The bytes 
object could be used to implement the following calls:

    PyBuffer_FromMemory(...)
    PyBuffer_FromReadWriteMemory(...)
    PyBuffer_New(...)

In these cases, the bytes object would hold the actual 
memory, and the buffer object would just be inspecting the 
bytes object.  I'd still stick to the strategy of having the buffer 
object re-request the pointer before every use (since typically 
the pointer is only valid while the GIL is held).  I haven't 
figured out how to handle the case when the size specified for 
the buffer object gets out of whack when the inspected object 
resizes.  Raise an exception?

Even with these changes, there would still be some problems 
in here.  For instance, the hash value is easy to invalidate. 


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 19:41

Message:
Logged In: YES 
user_id=6380

Note, the patch is out of date since somebody fixed some
nits with slicing, so I'm marking this as Out Of Date.

You might as well upload the new version of the file. :-)

Why do you think you need to fix the allocation? Since
allocation is done via malloc(), and malloc() guarantees
allocation for a double ("for all types"), shouldn't that be
enough??? (If it's obmalloc that you're worried about, it's
easy to force this to use the real malloc() and free().)

I hope Tim will make some time to review this (the "not this
week" comment is several months old now). Superficially it
looks like a big improvement.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-05-07 18:51

Message:
Logged In: YES 
user_id=31435

Na, assigning a bug is fine by me -- it helps to have 
*someone* feel guilty <wink>.  Assigning it doesn't mean it 
goes to the top of the assignee's heap, though.  I can't 
make time to look at it this week, so it's just as well 
that it got unassigned.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-07 12:55

Message:
Logged In: YES 
user_id=38318

Apparently assigning a patch is poor form.  My bad.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-05 04:27

Message:
Logged In: YES 
user_id=38318

Can I assign this to you or does it take admin privs?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 09:08:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 01:08:54 -0700
Subject: [Patches] [ python-Patches-550551 ] Read/Write buffers from buffer()
Message-ID: <E17Wuj0-0004jq-00@usw-sf-web4.sourceforge.net>

Patches item #550551, was opened at 2002-04-30 09:18
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550551&group_id=5470

Category: Core (C code)
Group: None
>Status: Deleted
Resolution: Postponed
Priority: 5
Submitted By: Scott Gilbert (xscott)
Assigned to: Nobody/Anonymous (nobody)
Summary: Read/Write buffers from buffer()

Initial Comment:
The buffer() builtin does not currently allow the 
creation of read-write buffers.  So there is no way 
from pure Python code to manipulate objects which 
support getting a writable pointer via their 
PyBufferProcs.  This patch tries to create a read-
write buffer first, and if that fails it will return a 
read-only buffer object as before.

It's tempting to check if the PyBufferProcs has the 
bf_getwritebuffer pointer and simply return 
PyBuffer_FromReadWriteObject(...) in this case.  This 
ends up being incorrect for PyStrings since they do 
have the bf_getwritebuffer pointer, but that always 
sets an exception.


----------------------------------------------------------------------

>Comment By: Scott Gilbert (xscott)
Date: 2002-07-23 08:08

Message:
Logged In: YES 
user_id=38318

The buffer builtin appears to be scheduled for deprecation, so 
this small patch is not worthwhile.

This is independant of creating buffer objects from the C API 
(as the that does not appear to be deprecated).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-05-07 12:35

Message:
Logged In: YES 
user_id=6380

Please don't assign patches to random developers.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-05 04:29

Message:
Logged In: YES 
user_id=38318

If you take patch 552438, then there shouldn't be anything 
wrong with this small feature patch...


----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-03 08:28

Message:
Logged In: YES 
user_id=38318

This patch should not be accepted until another one fixing 
a bug in PyBufferObjects is accepter.  So please back 
burner this one until further notice.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550551&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 10:59:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 02:59:00 -0700
Subject: [Patches] [ python-Patches-585101 ] Fix relative imports in regression tests
Message-ID: <E17WwRY-0005aP-00@usw-sf-web2.sourceforge.net>

Patches item #585101, was opened at 2002-07-22 21:53
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470

Category: Tests
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Barry A. Warsaw (bwarsaw)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Fix relative imports in regression tests

Initial Comment:
The regression test suite uses intrapackage relative
imports to import stuff like test_support, etc. 
There's no deep reason for this to be so, since "test"
is a standard package.  As long as all tests do
something like "from test import test_support" or
"import test.test_support" everything works fine. 
Keeping the relative imports makes life more difficult
for tests that don't live in the expected location of
Lib/test.

This patch fixes this by making sure all test imports
are absolute.  This works fine on *nix, but rumor has
it that the Mac tests are run differently so I'd like
Jack to comment on whether this patch breaks his test
suite or not.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-07-23 11:59

Message:
Logged In: YES 
user_id=45365

I can't test the patch right now, but after visual I can't imagine that it would cause any problems on the mac. Go ahead and check it in, I would say, and I'll complain when it breaks things:-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 16:50:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 08:50:42 -0700
Subject: [Patches] [ python-Patches-585101 ] Fix relative imports in regression tests
Message-ID: <E17X1vu-0005b7-00@usw-sf-web1.sourceforge.net>

Patches item #585101, was opened at 2002-07-22 15:53
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470

Category: Tests
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Barry A. Warsaw (bwarsaw)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Fix relative imports in regression tests

Initial Comment:
The regression test suite uses intrapackage relative
imports to import stuff like test_support, etc. 
There's no deep reason for this to be so, since "test"
is a standard package.  As long as all tests do
something like "from test import test_support" or
"import test.test_support" everything works fine. 
Keeping the relative imports makes life more difficult
for tests that don't live in the expected location of
Lib/test.

This patch fixes this by making sure all test imports
are absolute.  This works fine on *nix, but rumor has
it that the Mac tests are run differently so I'd like
Jack to comment on whether this patch breaks his test
suite or not.

----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-07-23 11:50

Message:
Logged In: YES 
user_id=12800

Cool.  I'll go ahead and commit these changes and then you
and Tim can both beat me up.  Guido's at OSCON so he'll have
to wait a week to beat me up. :)

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-23 05:59

Message:
Logged In: YES 
user_id=45365

I can't test the patch right now, but after visual I can't imagine that it would cause any problems on the mac. Go ahead and check it in, I would say, and I'll complain when it breaks things:-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585101&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 21:43:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 13:43:31 -0700
Subject: [Patches] [ python-Patches-555085 ] timeout socket implementation
Message-ID: <E17X6VH-00071W-00@usw-sf-web4.sourceforge.net>

Patches item #555085, was opened at 2002-05-12 08:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 4
Submitted By: Michael Gilfix (mgilfix)
Assigned to: Guido van Rossum (gvanrossum)
Summary: timeout socket implementation

Initial Comment:
This implements bug #457114 and implements timed socket
operations. If a timeout is set and the timeout period
elaspes before the socket operation has finished, a
socket.error exception is thrown.

This patch integrates the functionality at two levels:
the timeout capability is integrated at the C level in
socketmodule.c. Socket.py was also modified to update 
fileobject creation on a win platform to handle the
case of the underlying socket throwing an exception.
The tex documentation was also updated and a new
regression unit was provided as test_timeout.py.

----------------------------------------------------------------------

>Comment By: Michael Gilfix (mgilfix)
Date: 2002-07-23 16:43

Message:
Logged In: YES 
user_id=116038

Now that I'm back :)

I checked the archive and this seems to have been handled by
you. Please let me know if it isn't resolved and I can give
it a closer look.

Also, perhaps I should contact Bernie and ask him if there's
anything he hasn't gotten around to in the test_timeout that
I can off-load from him.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 13:11

Message:
Logged In: YES 
user_id=6380

The default timeout is now implemented in CVS.

There's a bug report from Andrew Macintyre (unfortunately on
python-dev) about test_socket.py failures on FreeBSD. I'll
try to keep an eye on that, so this patch *still* stays
open. Also, Bernie has promised some changes that I haven't
received yet and the details of which I don't recall (sorry
:-( ).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-07 21:47

Message:
Logged In: YES 
user_id=6380

Keeping this open as a reminder of things still to finish.

Most is in the python-dev discussion; Michael Gilfix and
Bernard Yue have offered to produce more patches.

One feature we definitely want is a way to specify a timeout
to be applied to all new sockets.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-06 17:11

Message:
Logged In: YES 
user_id=6380

Thanks for the new version! I've checked this in.  I made
considerable changes; the following is feedback but you
don't need to respond because I've addressed all these in
the checked-in code!

- Thanks for the cleanup of some non-standard formatting.
However, it's better not to do this so the diffs don't show
changes that are unrelated to the timeout patch.

- You are still importing the select module instead of
calling select() directly. I really think you should do the
latter -- the select module has an enormous overhead (it
allocates several large lists on the heap).

- Instead of explicitly testing the argument to settimeout
for being a float, int or long, you should simply call
PyFloat_AsDouble and handle the error; if someone passes
another object that implements __float__ that should be
acceptable.

- gettimeout() returns sock_timeout without checking if it
is NULL. It can be NULL when a socket object is never
initialized. E.g. I can do this:

>>> from socket import *
>>> s = socket.__new__(socket)
>>> s.gettimeout()

which gives me a segfault. There are probably other places
where this is assumed.

- I addressed the latter two issues by making sock_timeout a
double, whose value is < 0.0 when no timeout is set.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-06-05 18:23

Message:
Logged In: YES 
user_id=116038

I've addressed all the issues brought up by Guido. The 2nd
version of the patch is attached here. In this version, I've
modified test_socket.py to include tests for the _fileobject
class in socket.py that was modified by this patch.
_fileobject needed to be modified so that data would not be
lost when the underlying socket threw an expection (data was
no longer accumulated in local variables). The tests for the
_fileobject class succeed on older versions of python
(tested 2.1.3) and pass on the newer version of python.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-05-23 16:18

Message:
Logged In: YES 
user_id=6380

For a detailed review, see

http://mail.python.org/pipermail/python-dev/2002-May/024340.html

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 22:43:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 14:43:02 -0700
Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix)
Message-ID: <E17X7Qs-0005Z6-00@usw-sf-web1.sourceforge.net>

Patches item #584245, was opened at 2002-07-20 18:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: get python to link on OSF1 (Dec Unix)

Initial Comment:
Attached is a patch to fix the linking of python
(makedev not found) on Dec OSF/1 Unix 5.1.  This patch
has also been tested on Linux (RedHat 7.2).

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-23 23:43

Message:
Logged In: YES 
user_id=21627

That patch doesn't really test whether defining 
OSF_SOURCE helps in getting makedev, does it? In 
particular, if makedev is not available at all, or requires a 
different define, the test will still conclude that 
OSF_SOURCE should be defined, right?

I think the sequence should be:
- is makedev already available?
- if not, is it with OSF_SOURCE defined?
- if not, arrange to exclude makedev from posixmodule.c

Also, is it necessary to run the test program? autoconf is 
always worried that cross-compilation would fail, since you 
cannot run tests (although it is reasonable to link test 
programs in a cross-compilation environment).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 23:04:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 15:04:29 -0700
Subject: [Patches] [ python-Patches-506436 ] GETCONST/GETNAME/GETNAMEV speedup
Message-ID: <E17X7ld-0005tX-00@usw-sf-web1.sourceforge.net>

Patches item #506436, was opened at 2002-01-21 08:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Out of Date
Priority: 5
Submitted By: Skip Montanaro (montanaro)
>Assigned to: Skip Montanaro (montanaro)
Summary: GETCONST/GETNAME/GETNAMEV speedup

Initial Comment:
The attached patch redefines the GETCONST, GETNAME &
GETNAMEV 
macros to do the following:

  * access the code object's consts 
and names through
    local variables instead of the long chain from 
f

  * use access macros to index the tuples and get
    the C string 
names

The code appears correct, and I've had no trouble
with 
it.  It only provides the most trivial of
improvement on pystone 
(around 1% when I see
anything), but it's all those little things 
that
add up, right?

Skip


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-23 18:04

Message:
Logged In: YES 
user_id=31435

Marked Out-of-Date and back to Skip.  Sorry for the delay!

The idea is fine.  I'd rather you use the current GETITEM 
macro, which does bounds-checking in a debug build.  I note 
too that GETCONST is only used once, and that use may as 
well be a direct GETITEM(consts, i) invocation, and skip the 
macro.  Note that the GETNAME() macro no longer exists.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-22 23:34

Message:
Logged In: YES 
user_id=33168

Skip, I modified this code some, but your technique is still
valid.  I got rid of one of the indirections already.  The
patch can easily be updated.  Seems like the patch shouldn't
hurt.  Tim?

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-09 19:45

Message:
Logged In: YES 
user_id=44345

Looking for a vote up or down on this one...


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-01-21 08:47

Message:
Logged In: YES 
user_id=44345

Whoops...  Make the "observed" speedup 0.1%...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=506436&group_id=5470


From noreply@sourceforge.net  Tue Jul 23 23:03:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 23 Jul 2002 15:03:03 -0700
Subject: [Patches] [ python-Patches-583180 ] smtplib.py patch for macmail esmtp auth
Message-ID: <E17X7kF-0005sH-00@usw-sf-web1.sourceforge.net>

Patches item #583180, was opened at 2002-07-18 04:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: bob kuehne (mysticbob)
Assigned to: Nobody/Anonymous (nobody)
Summary: smtplib.py patch for macmail esmtp auth

Initial Comment:
i ran into a problem that i've seen several other people
describe where they can't authenticate to their particular
mail server. i dug into this (my mail server is smtp.mac.com)
and discovered that smtplib.py didn't support the specific
type of auth that this server required.

so, this patch,allows authentication to these specific
server types. i also reworked one token to make it
a bit more modular. the patch is attached, generated
of form: diff smtplib.py_orig smtplib.py_new

i'm new to python, and new to the whole patch process
on sourceforge, so  please let me know what i can do
to test, or how else i can work to get this in the next
python version. thank you!
bob

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-24 00:03

Message:
Logged In: YES 
user_id=21627

On patches: Please always use context (-c) or unified (-u) 
diffs; those stay valid longer.

On AUTH=LOGIN: Can you please try

http://sourceforge.net/tracker/index.php?
func=detail&aid=572031&group_id=5470&atid=305470

This pre-RFC AUTH protocol is by far not an invention of 
smtp.mac.com (or specific to it) - it is originally a Netscape 
invention, and widely implemented.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=583180&group_id=5470


From noreply@sourceforge.net  Wed Jul 24 14:05:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jul 2002 06:05:19 -0700
Subject: [Patches] [ python-Patches-572031 ] AUTH method LOGIN for smtplib
Message-ID: <E17XLpP-0005Fv-00@usw-sf-web1.sourceforge.net>

Patches item #572031, was opened at 2002-06-21 12:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572031&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: AUTH method LOGIN for smtplib

Initial Comment:
Unfortunately, my original SMTP auth patch doesn't work
so well in real life. There are two methods to
advertise the available auth methods for SMTP servers:

old-style: AUTH=method1 method2 ...
RFC style: AUTH method1 method2

Microsoft's MUAs are b0rken in that they only
understand the old-style method. That's why most SMTP
servers are configured to advertise their
authentication methods in old-style _and_ new style.
There are also some especially broken SMTP servers like
old M$ Exchange servers that only show their auth
methods via the old style.

Also the (sadly but true) very widely used M$ Exchange
server only supports the LOGIN auth method (I have to
use that thing at work, that's why I came up with this
patch). Exchange also supports some other proprietary
auth methods (NTLM, ...), but we needn't care about these.

My argument is that the Python SMTP AUTH support will
get a lot more useful to people if we also support

1) the old-style AUTH= advertisement
2) the LOGIN auth method, which, although not
standardized via RFCs and originally invented by
Netscape, is still in wide use, and for some servers
the only method to use them, so we should support it

Please note that in the current implementation, if a
server uses the old-style AUTH= method, our SMTP auth
support simply breaks because of the esmtp_features
parsing.

I'm randomly assigning this patch to Barry, because
AFAIK he knows a lot about email handling. Assign
around as you please :-)


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-24 15:05

Message:
Logged In: YES 
user_id=21627

In

http://sourceforge.net/tracker/?func=detail&atid=105470&aid=581165&group_id=5470

pierslauder reports success with this patch; see his
detailed report for remaining problems.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-17 15:39

Message:
Logged In: YES 
user_id=21627

That existing SMTP servers announce LOGIN only in the
old-style header is a good reason to support those as well;
I hence recommend that this patch is applied.

Microsoft is, strictly speaking, conforming to the RFC by
*not* reporting LOGIN in the AUTH header: only registered
SASL mechanism can be announced there, and LOGIN is not
registered; see

http://www.iana.org/assignments/sasl-mechanisms


----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-07-01 00:34

Message:
Logged In: YES 
user_id=163326

Updated patch. Changes to the previous patch:

- Use email.base64MIME.encode
  to get rid of the added
  newlines.
- Merge old and RFC-style auth methods
  in self.smtp_features instead of
  parsing old-style auth lines
  seperately.
- Removed example line for changing auth
  method priorities (we won't list all
  permutations of auth methods ;-)
- Removed superfluous logging call of
  chosen auth method.
- Moved comment about SMTP features
  syntax into the right place again.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-30 23:14

Message:
Logged In: YES 
user_id=163326

Martin,
the reason why we need to take into account both old and
RFC-style auth
advertisement is that there are some smtp servers, which
advertise different
auth mechanisms in the old vs. RFC-style line. In
particular, the MS Exchange
server that I have to use at work and I think that this is
even the default
configuration of Exchange 2000. In my case, it advertises
its LOGIN method only
in the AUTH= line.

I'll shortly upload a patch that takes this into account.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-30 18:20

Message:
Logged In: YES 
user_id=21627

I still cannot see why support for the old-style AUTH lines
is necessary. If all SMTPds announce their supported
mechanisms with both syntaxes, why is it then necessary to
even look at the old syntax?

I'm all for adding support for the LOGIN method.

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-06-30 17:59

Message:
Logged In: YES 
user_id=12800

Martin, (some? most?) MUAs post messages by talking directly
to their outgoing SMTPd, so that's probably why Gerhard
mentions it.

On the issue of base64 issue, see the comment in bug
#552605, which I just took assignment of.  I'll deal with
both these bug reports soon.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-30 17:41

Message:
Logged In: YES 
user_id=21627

I cannot understand why the behaviour of MS MUAs is relevant
here at all; smtplib only talks to MTAs (or MSAs).

If MTAs advertise the AUTH extension in the new syntax in
addition to the old syntax, why is it not good to just
ignore the old advertisement? Can you point to  a specific
software package (ideally even a specific host) which fails
to interact with the current smtplib correctly?

----------------------------------------------------------------------

Comment By: Jason R. Mastaler (jasonrm)
Date: 2002-06-22 05:53

Message:
Logged In: YES 
user_id=85984

A comment on the old-style advertisement.

You say that Microsoft's MUAs only understand the
old-style method.  I haven't found this to be the case.

tmda-ofmipd is an outgoing SMTP proxy that supports
SMTP authentication, and I only use the RFC style
advertisement.  This works perfectly well with MS
clients like Outlook 2000, and Outlook Express 5.
Below is an example of what the advertisement looks
like.

BTW, no disagreement about supporting the old-style
advertisement in smtplib, as I think it's prudent, just 
making a point.

# telnet aguirre 8025
Trying 172.18.3.5...
Connected to aguirre.la.mastaler.com.
Escape character is '^]'.
220 aguirre.la.mastaler.com ESMTP tmda-ofmipd
EHLO aguirre.la.mastaler.com
250-aguirre.la.mastaler.com
250 AUTH LOGIN CRAM-MD5 PLAIN
QUIT
221 Bye
Connection closed by foreign host.


----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-06-21 12:43

Message:
Logged In: YES 
user_id=163326

This also includes a slightly modified version of patch #552605.

Even better would IMO be to add an additional parameter to
base64.encode* and the corresponding binascii functions that
avoids the insertion of newline characters.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572031&group_id=5470


From noreply@sourceforge.net  Wed Jul 24 14:27:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jul 2002 06:27:49 -0700
Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py
Message-ID: <E17XMBB-0006PY-00@usw-sf-web5.sourceforge.net>

Patches item #585913, was opened at 2002-07-24 08:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Copeland (oracle)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adds Galeon support to webbrowser.py

Initial Comment:
Simple context diff against current CVS tree to add
support for Galeon to webbrowser.py


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470


From noreply@sourceforge.net  Wed Jul 24 14:29:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jul 2002 06:29:06 -0700
Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py
Message-ID: <E17XMCQ-0006Rp-00@usw-sf-web5.sourceforge.net>

Patches item #585913, was opened at 2002-07-24 08:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470

Category: Library (Lib)
>Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Copeland (oracle)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adds Galeon support to webbrowser.py

Initial Comment:
Simple context diff against current CVS tree to add
support for Galeon to webbrowser.py


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470


From noreply@sourceforge.net  Wed Jul 24 19:55:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jul 2002 11:55:39 -0700
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E17XRIR-00024b-00@usw-sf-web2.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-24 20:55

Message:
Logged In: YES 
user_id=89016

diff12.txt finally implements the PEP293 specification (i.e.
using exceptions for the communication between codec and
handler)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-30 18:30

Message:
Logged In: YES 
user_id=89016

diff11.txt fixes two refcounting bugs in codecs.c.
speedtest.py is a little test script, that checks to speed
of various string/encoding/error combinations.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-29 22:50

Message:
Logged In: YES 
user_id=89016

This new version diff10.txt fixes a memory 
overwrite/reallocation bug in PyUnicode_EncodeCharmap and 
moves the error handling out of PyUnicode_EncodeCharmap. 
A new version of the test script is included too.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-16 21:06

Message:
Logged In: YES 
user_id=89016

OK, PyUnicode_TranslateCharmap is finished too. As the 
errors argument is again not exposed to Python it can't 
really be tested. Should we add errors as an optional 
argument to unicode.translate?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-01 19:57

Message:
Logged In: YES 
user_id=89016

OK, PyUnicode_EncodeDecimal is done (diff8.txt), but as the 
errors argument can't be accessed from Python code, there's 
not much testing for this.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-20 17:34

Message:
Logged In: YES 
user_id=89016

A new idea for the interface between the
codec and the callback:

Maybe we could have new exception classes
UnicodeEncodeError, UnicodeDecodeError
and UnicodeTranslateError derived from
UnicodeError. They have all the attributes
that are passed as an argument
tuple in the current version:
string: the original string
start: the start position of the
unencodable characters/undecodable bytes
end: the end position+1 of the unencodable
characters/undecodable bytes.
reason: the a string, that explains, why
the encoding/decoding doesn't work.

There is no data object, because when a codec
wants to pass extended information to the
callback it can do this via a derived
class.

It might be better to move these attributes
to the base class UnicodeError, but this
might have backwards compatibility
problems.

With this method we really can have one global
registry for all callbacks, because for callback
names that must work with encoding *and* decoding
*and* translating (i.e. "strict", "replace" and 
"ignore"), the callback can check which type 
of exception was passed, so "replace" can
e.g. look like this:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      return ("?", exc.end)
   else:
      return (u"?"*(exc.end-exc.start), exc.end)

Another possibility would be to do the commucation
callback->codec by assigning to attributes
of the exception object. The resyncronisation 
position could even be preassigned to end, so
the callback only needs to specify the 
replacement in most cases:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      exc.replacement = "?"
   else:
      exc.replacement = u"?"*(exc.end-exc.start)

As many of the assignments can now be done on
the C level without having to allocate Python
objects (except for the replacement string
and the reason), this version might even be 
faster, especially if we allow the codec to 
reuse the exception object for the next call 
to the callback.

Does this make sense, or is this to fancy?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-18 21:24

Message:
Logged In: YES 
user_id=89016

And here is the test script (test_codeccallbacks.py)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-18 21:22

Message:
Logged In: YES 
user_id=89016

OK, here is the current version of the patch (diff7.txt). 
PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are 
still missing.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-17 22:50

Message:
Logged In: YES 
user_id=89016

> About the difference between encoding 
> and decoding: you shouldn't just look 
> at the case where you work with Unicode 
> and strings, e.g. take the rot-13 codec
> which works on strings only or other
> codecs which translate objects into 
> strings and vice-versa.

unicode.encode encodes to str and 
str.decode decodes to unicode,
even for rot-13:

>>> u"gürk".encode("rot13")
't\xfcex'
>>> "gürk".decode("rot13")
u't\xfcex'
>>> u"gürk".decode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'unicode' object has no attribute 'decode'
>>> "gürk".encode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/walter/Python-current-
readonly/dist/src/Lib/encodings/rot_13.py", line 18, in 
encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeError: ASCII decoding error: ordinal not in range
(128)

Here the str is converted to unicode
first, before encode is called, but the
conversion to unicode fails.

Is there an example where something
else happens?

> Error handling has to be flexible enough 
> to handle all these situations. Since 
> the codecs know best how to handle the
> situations, I'd make this an implementation 
> detail of the codec and leave the
> behaviour undefined in the general case.

OK, but we should suggest, that for encoding
unencodable characters are collected
and for decoding seperate byte sequences
that are considered broken by the codec
are passed to the callback: i.e for 
decoding the handler will never get
all broken data in one call, e.g. 
for "\u30\Uffffffff".decode("unicode-escape")
the handler will be called twice (once for
"\u30" and "truncated \u escape" as the
reason and once for "\Uffffffff" and
"illegal character" as the reason.)

> For the existing codecs, backward 
> compatibility should be maintained, 
> if at all possible. If the patch gets 
> overly complicated because of this, 
> we may have to provide a downgrade solution
> for this particular problem (I don't think 
> replace is used in any computational context, 
> though, since you can never be sure how 
> many replacement character do get 
> inserted, so the case may not be 
> that realistic).
> 
> Raising an exception for the charmap codec 
> is the right way to go, IMHO. I would 
> consider the current behaviour a bug.

OK, this is implemented in PyUnicode_EncodeCharmap now, 
and collecting unencodable characters works too.

I completely changed the implementation,
because the stack approach would have
gotten much more complicated when
unencodable characters are collected.

> For new codecs, I think we should 
> suggest that replace tries to collect 
> as much illegal data as possible before
> invoking the error handler. The handler 
> should be aware of the fact that it 
> won't necessarily get all the broken 
> data in one call.

OK for encoders, for decoders see
above.

> About the codec error handling 
> registry: You seem to be using a 
> Unicode specific approach here. 
> I'd rather like to see a generic 
> approach which uses the API 
> we discussed earlier. Would that be possible?

The handlers in the registry are all Unicode
specific. and they are different for encoding
and for decoding.

I renamed the function because of your
comment from 2001-06-13 10:05 (which 
becomes exceedingly difficult to find on
this long page! ;)).

> In that case, the codec API should 
> probably be called 
> codecs.register_error('myhandler', myhandler).
> 
> Does that make sense ?

We could require that unique names
are used for custom handlers, but
for the standard handlers we do have
name collisions. To prevent them, we
could either remove them from the registry
and require that the codec implements
the error handling for those itself,
or we could to some fiddling, so that
u"üöä".encode("ascii", "replace")
becomes 
u"üöä".encode("ascii", "unicodeencodereplace")
behind the scenes.

But I think two unicode specific 
registries are much simpler to handle.

> BTW, the patch which uses the callback 
> registry does not seem to be available 
> on this SF page (the last patch still 
> converts the errors argument to a 
> PyObject, which shouldn't be needed
> anymore with the new approach). 
> Can you please upload your 
> latest version?

OK, I'll upload a preliminary version
tomorrow. PyUnicode_EncodeDecimal and
PyUnicode_TranslateCharmap are still
missing, but otherwise the patch seems
to be finished. All decoders work and
the encoders collect unencodable characters
and implement the handling of known
callback handler names themselves.

As PyUnicode_EncodeDecimal is only used
by the int, long, float, and complex constructors,
I'd love to get rid of the errors argument,
but for completeness sake, I'll implement
the callback functionality.

> Note that the highlighting codec 
> would make a nice example
> for the new feature.

This could be part of the codec callback test
script, which I've started to write. We could
kill two birds with one stone here:
1. Test the implementation.
2. Document and advocate what is 
   possible with the patch.

Another idea: we could have as an example
a decoding handler that relaxes the
UTF-8 minimal encoding restriction, e.g.

def relaxedutf8(enc, uni, startpos, endpos, reason, data):
   if uni[startpos:startpos+2] == u"\xc0\x80":
      return (u"\x00", startpos+2)
   else:
      raise UnicodeError(...)


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-04-17 21:40

Message:
Logged In: YES 
user_id=38388

Sorry for the late response.

About the difference between encoding and decoding: you shouldn't
just look at the case where you work with Unicode and strings, e.g.
take the rot-13 codec which works on strings only or other codecs
which translate objects into strings and vice-versa.

Error handling has to be flexible enough to handle all these 
situations. Since the codecs know best how to handle the situations,
I'd make this an implementation detail of the codec and leave the
behaviour undefined in the general case.

For the existing codecs, backward compatibility should be 
maintained, if at all possible. If the patch gets overly complicated
because of this, we may have to provide a downgrade solution
for this particular problem (I don't think replace is used in any
computational context, though, since you can never be sure
how many replacement character do get inserted, so the case
may not be that realistic).

Raising an exception for the charmap codec is the right
way to go, IMHO. I would consider the current behaviour
a bug.

For new codecs, I think we should suggest that replace
tries to collect as much illegal data as possible before
invoking the error handler. The handler should be aware
of the fact that it won't necessarily get all the broken data
in one call.

About the codec error handling registry:
You seem to be using a Unicode specific approach
here. I'd rather like to see a generic approach which uses
the API we discussed earlier. Would that be possible ?
In that case, the codec API should probably be called
codecs.register_error('myhandler', myhandler).

Does that make sense ?

BTW, the patch which uses the callback registry does not seem
to be available on this SF page (the last patch still converts
the errors argument to a PyObject, which shouldn't be needed
anymore with the new approach). Can you please upload your 
latest version ?

Note that the highlighting codec would make a nice example
for the new feature.

Thanks.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-17 12:21

Message:
Logged In: YES 
user_id=89016

Another note: the patch will change the meaning of charmap 
encoding slightly: currently "replace" will put a ? into 
the output, even if ? is not in the mapping, i.e. 
codecs.charmap_encode(u"c", "replace", {ord("a"): ord
("b")}) will return ('?', 1).

With the patch the above example will raise an exception.

Off course with the patch many more replace characters can 
appear, so it is vital that for the replacement string the 
mapping is done.

Is this semantic change OK? (I guess all of the existing 
codecs have a mapping ord("?")->ord("?"))


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:19

Message:
Logged In: YES 
user_id=89016

So this means that the encoder can collect illegal 
characters and pass it to the callback. "replace" will 
replace this with (end-start)*u"?".

Decoders don't collect all illegal byte sequences, but call 
the callback once for every byte sequence that has been 
found illegal and "replace" will replace it with u"?".

Does this make sense?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:06

Message:
Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\Ux\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 19:36

Message:
Logged In: YES 
user_id=38388

Hmm, whatever it takes to maintain backwards 
compatibility. Do you have an example ?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 18:31

Message:
Logged In: YES 
user_id=89016

What should replace do: Return u"?" or (end-start)*u"?"

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 16:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Wed Jul 24 20:04:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jul 2002 12:04:45 -0700
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E17XRRF-00034T-00@usw-sf-web1.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-24 21:04

Message:
Logged In: YES 
user_id=89016

Attached is a new version of the test script. But we need
more tests. UTF-7 is completely untested and using codecs
that pass wrong arguments to the handler and handler that
return wrong or out of bounds results is untested too.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-24 20:55

Message:
Logged In: YES 
user_id=89016

diff12.txt finally implements the PEP293 specification (i.e.
using exceptions for the communication between codec and
handler)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-30 18:30

Message:
Logged In: YES 
user_id=89016

diff11.txt fixes two refcounting bugs in codecs.c.
speedtest.py is a little test script, that checks to speed
of various string/encoding/error combinations.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-29 22:50

Message:
Logged In: YES 
user_id=89016

This new version diff10.txt fixes a memory 
overwrite/reallocation bug in PyUnicode_EncodeCharmap and 
moves the error handling out of PyUnicode_EncodeCharmap. 
A new version of the test script is included too.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-16 21:06

Message:
Logged In: YES 
user_id=89016

OK, PyUnicode_TranslateCharmap is finished too. As the 
errors argument is again not exposed to Python it can't 
really be tested. Should we add errors as an optional 
argument to unicode.translate?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-01 19:57

Message:
Logged In: YES 
user_id=89016

OK, PyUnicode_EncodeDecimal is done (diff8.txt), but as the 
errors argument can't be accessed from Python code, there's 
not much testing for this.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-20 17:34

Message:
Logged In: YES 
user_id=89016

A new idea for the interface between the
codec and the callback:

Maybe we could have new exception classes
UnicodeEncodeError, UnicodeDecodeError
and UnicodeTranslateError derived from
UnicodeError. They have all the attributes
that are passed as an argument
tuple in the current version:
string: the original string
start: the start position of the
unencodable characters/undecodable bytes
end: the end position+1 of the unencodable
characters/undecodable bytes.
reason: the a string, that explains, why
the encoding/decoding doesn't work.

There is no data object, because when a codec
wants to pass extended information to the
callback it can do this via a derived
class.

It might be better to move these attributes
to the base class UnicodeError, but this
might have backwards compatibility
problems.

With this method we really can have one global
registry for all callbacks, because for callback
names that must work with encoding *and* decoding
*and* translating (i.e. "strict", "replace" and 
"ignore"), the callback can check which type 
of exception was passed, so "replace" can
e.g. look like this:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      return ("?", exc.end)
   else:
      return (u"?"*(exc.end-exc.start), exc.end)

Another possibility would be to do the commucation
callback->codec by assigning to attributes
of the exception object. The resyncronisation 
position could even be preassigned to end, so
the callback only needs to specify the 
replacement in most cases:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      exc.replacement = "?"
   else:
      exc.replacement = u"?"*(exc.end-exc.start)

As many of the assignments can now be done on
the C level without having to allocate Python
objects (except for the replacement string
and the reason), this version might even be 
faster, especially if we allow the codec to 
reuse the exception object for the next call 
to the callback.

Does this make sense, or is this to fancy?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-18 21:24

Message:
Logged In: YES 
user_id=89016

And here is the test script (test_codeccallbacks.py)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-18 21:22

Message:
Logged In: YES 
user_id=89016

OK, here is the current version of the patch (diff7.txt). 
PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are 
still missing.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-17 22:50

Message:
Logged In: YES 
user_id=89016

> About the difference between encoding 
> and decoding: you shouldn't just look 
> at the case where you work with Unicode 
> and strings, e.g. take the rot-13 codec
> which works on strings only or other
> codecs which translate objects into 
> strings and vice-versa.

unicode.encode encodes to str and 
str.decode decodes to unicode,
even for rot-13:

>>> u"gürk".encode("rot13")
't\xfcex'
>>> "gürk".decode("rot13")
u't\xfcex'
>>> u"gürk".decode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'unicode' object has no attribute 'decode'
>>> "gürk".encode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/walter/Python-current-
readonly/dist/src/Lib/encodings/rot_13.py", line 18, in 
encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeError: ASCII decoding error: ordinal not in range
(128)

Here the str is converted to unicode
first, before encode is called, but the
conversion to unicode fails.

Is there an example where something
else happens?

> Error handling has to be flexible enough 
> to handle all these situations. Since 
> the codecs know best how to handle the
> situations, I'd make this an implementation 
> detail of the codec and leave the
> behaviour undefined in the general case.

OK, but we should suggest, that for encoding
unencodable characters are collected
and for decoding seperate byte sequences
that are considered broken by the codec
are passed to the callback: i.e for 
decoding the handler will never get
all broken data in one call, e.g. 
for "\u30\Uffffffff".decode("unicode-escape")
the handler will be called twice (once for
"\u30" and "truncated \u escape" as the
reason and once for "\Uffffffff" and
"illegal character" as the reason.)

> For the existing codecs, backward 
> compatibility should be maintained, 
> if at all possible. If the patch gets 
> overly complicated because of this, 
> we may have to provide a downgrade solution
> for this particular problem (I don't think 
> replace is used in any computational context, 
> though, since you can never be sure how 
> many replacement character do get 
> inserted, so the case may not be 
> that realistic).
> 
> Raising an exception for the charmap codec 
> is the right way to go, IMHO. I would 
> consider the current behaviour a bug.

OK, this is implemented in PyUnicode_EncodeCharmap now, 
and collecting unencodable characters works too.

I completely changed the implementation,
because the stack approach would have
gotten much more complicated when
unencodable characters are collected.

> For new codecs, I think we should 
> suggest that replace tries to collect 
> as much illegal data as possible before
> invoking the error handler. The handler 
> should be aware of the fact that it 
> won't necessarily get all the broken 
> data in one call.

OK for encoders, for decoders see
above.

> About the codec error handling 
> registry: You seem to be using a 
> Unicode specific approach here. 
> I'd rather like to see a generic 
> approach which uses the API 
> we discussed earlier. Would that be possible?

The handlers in the registry are all Unicode
specific. and they are different for encoding
and for decoding.

I renamed the function because of your
comment from 2001-06-13 10:05 (which 
becomes exceedingly difficult to find on
this long page! ;)).

> In that case, the codec API should 
> probably be called 
> codecs.register_error('myhandler', myhandler).
> 
> Does that make sense ?

We could require that unique names
are used for custom handlers, but
for the standard handlers we do have
name collisions. To prevent them, we
could either remove them from the registry
and require that the codec implements
the error handling for those itself,
or we could to some fiddling, so that
u"üöä".encode("ascii", "replace")
becomes 
u"üöä".encode("ascii", "unicodeencodereplace")
behind the scenes.

But I think two unicode specific 
registries are much simpler to handle.

> BTW, the patch which uses the callback 
> registry does not seem to be available 
> on this SF page (the last patch still 
> converts the errors argument to a 
> PyObject, which shouldn't be needed
> anymore with the new approach). 
> Can you please upload your 
> latest version?

OK, I'll upload a preliminary version
tomorrow. PyUnicode_EncodeDecimal and
PyUnicode_TranslateCharmap are still
missing, but otherwise the patch seems
to be finished. All decoders work and
the encoders collect unencodable characters
and implement the handling of known
callback handler names themselves.

As PyUnicode_EncodeDecimal is only used
by the int, long, float, and complex constructors,
I'd love to get rid of the errors argument,
but for completeness sake, I'll implement
the callback functionality.

> Note that the highlighting codec 
> would make a nice example
> for the new feature.

This could be part of the codec callback test
script, which I've started to write. We could
kill two birds with one stone here:
1. Test the implementation.
2. Document and advocate what is 
   possible with the patch.

Another idea: we could have as an example
a decoding handler that relaxes the
UTF-8 minimal encoding restriction, e.g.

def relaxedutf8(enc, uni, startpos, endpos, reason, data):
   if uni[startpos:startpos+2] == u"\xc0\x80":
      return (u"\x00", startpos+2)
   else:
      raise UnicodeError(...)


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-04-17 21:40

Message:
Logged In: YES 
user_id=38388

Sorry for the late response.

About the difference between encoding and decoding: you shouldn't
just look at the case where you work with Unicode and strings, e.g.
take the rot-13 codec which works on strings only or other codecs
which translate objects into strings and vice-versa.

Error handling has to be flexible enough to handle all these 
situations. Since the codecs know best how to handle the situations,
I'd make this an implementation detail of the codec and leave the
behaviour undefined in the general case.

For the existing codecs, backward compatibility should be 
maintained, if at all possible. If the patch gets overly complicated
because of this, we may have to provide a downgrade solution
for this particular problem (I don't think replace is used in any
computational context, though, since you can never be sure
how many replacement character do get inserted, so the case
may not be that realistic).

Raising an exception for the charmap codec is the right
way to go, IMHO. I would consider the current behaviour
a bug.

For new codecs, I think we should suggest that replace
tries to collect as much illegal data as possible before
invoking the error handler. The handler should be aware
of the fact that it won't necessarily get all the broken data
in one call.

About the codec error handling registry:
You seem to be using a Unicode specific approach
here. I'd rather like to see a generic approach which uses
the API we discussed earlier. Would that be possible ?
In that case, the codec API should probably be called
codecs.register_error('myhandler', myhandler).

Does that make sense ?

BTW, the patch which uses the callback registry does not seem
to be available on this SF page (the last patch still converts
the errors argument to a PyObject, which shouldn't be needed
anymore with the new approach). Can you please upload your 
latest version ?

Note that the highlighting codec would make a nice example
for the new feature.

Thanks.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-17 12:21

Message:
Logged In: YES 
user_id=89016

Another note: the patch will change the meaning of charmap 
encoding slightly: currently "replace" will put a ? into 
the output, even if ? is not in the mapping, i.e. 
codecs.charmap_encode(u"c", "replace", {ord("a"): ord
("b")}) will return ('?', 1).

With the patch the above example will raise an exception.

Off course with the patch many more replace characters can 
appear, so it is vital that for the replacement string the 
mapping is done.

Is this semantic change OK? (I guess all of the existing 
codecs have a mapping ord("?")->ord("?"))


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:19

Message:
Logged In: YES 
user_id=89016

So this means that the encoder can collect illegal 
characters and pass it to the callback. "replace" will 
replace this with (end-start)*u"?".

Decoders don't collect all illegal byte sequences, but call 
the callback once for every byte sequence that has been 
found illegal and "replace" will replace it with u"?".

Does this make sense?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:06

Message:
Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\Ux\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 19:36

Message:
Logged In: YES 
user_id=38388

Hmm, whatever it takes to maintain backwards 
compatibility. Do you have an example ?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 18:31

Message:
Logged In: YES 
user_id=89016

What should replace do: Return u"?" or (end-start)*u"?"

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 16:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Wed Jul 24 21:36:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 24 Jul 2002 13:36:41 -0700
Subject: [Patches] [ python-Patches-552438 ] PyBufferObject fixes
Message-ID: <E17XSsD-00005I-00@usw-sf-web5.sourceforge.net>

Patches item #552438, was opened at 2002-05-05 00:26
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470

Category: Core (C code)
Group: None
Status: Open
>Resolution: Postponed
Priority: 5
Submitted By: Scott Gilbert (xscott)
>Assigned to: Nobody/Anonymous (nobody)
Summary: PyBufferObject fixes

Initial Comment:
This patch fixes these problems:

  1) Dangling pointer problem
  2) buffer allocated by PyBuffer_New not aligned

The PyBufferObject acts differently depending on 
whether it allocated the memory or if it's borrowing 
the memory from a PyBufferProcs supporting object.

In the case of allocating it's own memory, I made a 
slight addition that adds some padding so that the ptr 
is on a sizeof(double) boundary.

In the case of borrowing another objects PyBufferProcs 
memory, PyBufferObject no longer caches the pointer.  
This might slow things down (probably not by much), 
but it keeps PyBufferObject from working with a stale 
pointer.


Normally I wouldn't do this, but since this patch 
touches pretty much every function anyway, I fixed 
many deviations from the Python coding style.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-24 16:36

Message:
Logged In: YES 
user_id=31435

Since Scott is on to something else, marked this Postponed 
and unassigned it.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-07-23 04:03

Message:
Logged In: YES 
user_id=38318

On top of the current patch being out of data, in private email, 
Guido indicated that Tim thinks the code needs more 
refactoring to simplify it.

I'd like to hold off on resubmitting a current patch to see how 
the bytes object fairs (PEP 296).  If the bytes object makes it 
into the Python core, then probably the best way to simplify 
and fix the implementation of the buffer object is to reduce it 
nothing but a "Buffer Inspector" for other objects.  (Tearing out 
the b_ptr field and a lot of if statements at least.)  The bytes 
object could be used to implement the following calls:

    PyBuffer_FromMemory(...)
    PyBuffer_FromReadWriteMemory(...)
    PyBuffer_New(...)

In these cases, the bytes object would hold the actual 
memory, and the buffer object would just be inspecting the 
bytes object.  I'd still stick to the strategy of having the buffer 
object re-request the pointer before every use (since typically 
the pointer is only valid while the GIL is held).  I haven't 
figured out how to handle the case when the size specified for 
the buffer object gets out of whack when the inspected object 
resizes.  Raise an exception?

Even with these changes, there would still be some problems 
in here.  For instance, the hash value is easy to invalidate. 


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 15:41

Message:
Logged In: YES 
user_id=6380

Note, the patch is out of date since somebody fixed some
nits with slicing, so I'm marking this as Out Of Date.

You might as well upload the new version of the file. :-)

Why do you think you need to fix the allocation? Since
allocation is done via malloc(), and malloc() guarantees
allocation for a double ("for all types"), shouldn't that be
enough??? (If it's obmalloc that you're worried about, it's
easy to force this to use the real malloc() and free().)

I hope Tim will make some time to review this (the "not this
week" comment is several months old now). Superficially it
looks like a big improvement.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-05-07 14:51

Message:
Logged In: YES 
user_id=31435

Na, assigning a bug is fine by me -- it helps to have 
*someone* feel guilty <wink>.  Assigning it doesn't mean it 
goes to the top of the assignee's heap, though.  I can't 
make time to look at it this week, so it's just as well 
that it got unassigned.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-07 08:55

Message:
Logged In: YES 
user_id=38318

Apparently assigning a patch is poor form.  My bad.

----------------------------------------------------------------------

Comment By: Scott Gilbert (xscott)
Date: 2002-05-05 00:27

Message:
Logged In: YES 
user_id=38318

Can I assign this to you or does it take admin privs?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552438&group_id=5470


From noreply@sourceforge.net  Thu Jul 25 13:05:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 25 Jul 2002 05:05:23 -0700
Subject: [Patches] [ python-Patches-586437 ] galeon support in webbrowser
Message-ID: <E17XhMx-0000U2-00@usw-sf-web4.sourceforge.net>

Patches item #586437, was opened at 2002-07-25 17:35
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Supreet Sethi (supreet)
Assigned to: Nobody/Anonymous (nobody)
Summary: galeon support in webbrowser

Initial Comment:
adds galeon support to webbrowser.py 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470


From noreply@sourceforge.net  Thu Jul 25 17:21:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 25 Jul 2002 09:21:33 -0700
Subject: [Patches] [ python-Patches-586561 ] Better token-related error messages
Message-ID: <E17XlMr-0005JR-00@usw-sf-web4.sourceforge.net>

Patches item #586561, was opened at 2002-07-25 11:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586561&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Jeremy Hylton (jhylton)
Summary: Better token-related error messages

Initial Comment:
There were some complaints recently on c.l.py about the rather 
non-informative error messages emitted as a result of the tokenizer 
detecting a problem.  In many situations it simply returns 
E_TOKEN which generates a fairly benign, but often unhelpful 
"invalid token" message.

This patch adds several new E_* macrosto Includes/errorcode.h, 
returns them from the appropriate places in Parser/tokenizer.c and 
generates more specific messages in Python/pythonrun.c.  I think the 
error messages are always better, though in some situations they may 
still not be strictly correct.

Assigning to Jeremy since he's the compiler wiz.

Skip


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586561&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 16:51:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 08:51:08 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Y7My-0005Ub-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 16:41:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 08:41:07 -0700
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E17Y7DH-0005GH-00@usw-sf-web4.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-26 17:41

Message:
Logged In: YES 
user_id=89016

The attached new version of the test script add test for wrong
parameter passed to the callbacks or wrong results returned
from the callback. It also add tests to the long string
tests for copies of the builtin error handlers, so the codec
does not recognize the name and goes through the general
callback machinery.

UTF-7 decoding still has a flaw inherited from the current
implementation:

>>> "+xxx".decode("utf-7")                    
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf7' codec can't decode bytes in
position 0-3: unterminated shift sequence
*>>> "+xxx".decode("utf-7", "ignore")
u'\uc71c'

The decoder should consider the whole sequence "+xxx" as
undecodable, so "Ignore" should return an empty string.
Currently the correct sequence will be passed to the
callback, but the faulty sequence has already been emitted
to the result string.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-24 21:04

Message:
Logged In: YES 
user_id=89016

Attached is a new version of the test script. But we need
more tests. UTF-7 is completely untested and using codecs
that pass wrong arguments to the handler and handler that
return wrong or out of bounds results is untested too.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-24 20:55

Message:
Logged In: YES 
user_id=89016

diff12.txt finally implements the PEP293 specification (i.e.
using exceptions for the communication between codec and
handler)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-30 18:30

Message:
Logged In: YES 
user_id=89016

diff11.txt fixes two refcounting bugs in codecs.c.
speedtest.py is a little test script, that checks to speed
of various string/encoding/error combinations.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-29 22:50

Message:
Logged In: YES 
user_id=89016

This new version diff10.txt fixes a memory 
overwrite/reallocation bug in PyUnicode_EncodeCharmap and 
moves the error handling out of PyUnicode_EncodeCharmap. 
A new version of the test script is included too.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-16 21:06

Message:
Logged In: YES 
user_id=89016

OK, PyUnicode_TranslateCharmap is finished too. As the 
errors argument is again not exposed to Python it can't 
really be tested. Should we add errors as an optional 
argument to unicode.translate?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-01 19:57

Message:
Logged In: YES 
user_id=89016

OK, PyUnicode_EncodeDecimal is done (diff8.txt), but as the 
errors argument can't be accessed from Python code, there's 
not much testing for this.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-20 17:34

Message:
Logged In: YES 
user_id=89016

A new idea for the interface between the
codec and the callback:

Maybe we could have new exception classes
UnicodeEncodeError, UnicodeDecodeError
and UnicodeTranslateError derived from
UnicodeError. They have all the attributes
that are passed as an argument
tuple in the current version:
string: the original string
start: the start position of the
unencodable characters/undecodable bytes
end: the end position+1 of the unencodable
characters/undecodable bytes.
reason: the a string, that explains, why
the encoding/decoding doesn't work.

There is no data object, because when a codec
wants to pass extended information to the
callback it can do this via a derived
class.

It might be better to move these attributes
to the base class UnicodeError, but this
might have backwards compatibility
problems.

With this method we really can have one global
registry for all callbacks, because for callback
names that must work with encoding *and* decoding
*and* translating (i.e. "strict", "replace" and 
"ignore"), the callback can check which type 
of exception was passed, so "replace" can
e.g. look like this:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      return ("?", exc.end)
   else:
      return (u"?"*(exc.end-exc.start), exc.end)

Another possibility would be to do the commucation
callback->codec by assigning to attributes
of the exception object. The resyncronisation 
position could even be preassigned to end, so
the callback only needs to specify the 
replacement in most cases:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      exc.replacement = "?"
   else:
      exc.replacement = u"?"*(exc.end-exc.start)

As many of the assignments can now be done on
the C level without having to allocate Python
objects (except for the replacement string
and the reason), this version might even be 
faster, especially if we allow the codec to 
reuse the exception object for the next call 
to the callback.

Does this make sense, or is this to fancy?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-18 21:24

Message:
Logged In: YES 
user_id=89016

And here is the test script (test_codeccallbacks.py)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-18 21:22

Message:
Logged In: YES 
user_id=89016

OK, here is the current version of the patch (diff7.txt). 
PyUnicode_EncodeDecimal and PyUnicode_TranslateCharmap are 
still missing.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-17 22:50

Message:
Logged In: YES 
user_id=89016

> About the difference between encoding 
> and decoding: you shouldn't just look 
> at the case where you work with Unicode 
> and strings, e.g. take the rot-13 codec
> which works on strings only or other
> codecs which translate objects into 
> strings and vice-versa.

unicode.encode encodes to str and 
str.decode decodes to unicode,
even for rot-13:

>>> u"gürk".encode("rot13")
't\xfcex'
>>> "gürk".decode("rot13")
u't\xfcex'
>>> u"gürk".decode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'unicode' object has no attribute 'decode'
>>> "gürk".encode("rot13")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/walter/Python-current-
readonly/dist/src/Lib/encodings/rot_13.py", line 18, in 
encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeError: ASCII decoding error: ordinal not in range
(128)

Here the str is converted to unicode
first, before encode is called, but the
conversion to unicode fails.

Is there an example where something
else happens?

> Error handling has to be flexible enough 
> to handle all these situations. Since 
> the codecs know best how to handle the
> situations, I'd make this an implementation 
> detail of the codec and leave the
> behaviour undefined in the general case.

OK, but we should suggest, that for encoding
unencodable characters are collected
and for decoding seperate byte sequences
that are considered broken by the codec
are passed to the callback: i.e for 
decoding the handler will never get
all broken data in one call, e.g. 
for "\u30\Uffffffff".decode("unicode-escape")
the handler will be called twice (once for
"\u30" and "truncated \u escape" as the
reason and once for "\Uffffffff" and
"illegal character" as the reason.)

> For the existing codecs, backward 
> compatibility should be maintained, 
> if at all possible. If the patch gets 
> overly complicated because of this, 
> we may have to provide a downgrade solution
> for this particular problem (I don't think 
> replace is used in any computational context, 
> though, since you can never be sure how 
> many replacement character do get 
> inserted, so the case may not be 
> that realistic).
> 
> Raising an exception for the charmap codec 
> is the right way to go, IMHO. I would 
> consider the current behaviour a bug.

OK, this is implemented in PyUnicode_EncodeCharmap now, 
and collecting unencodable characters works too.

I completely changed the implementation,
because the stack approach would have
gotten much more complicated when
unencodable characters are collected.

> For new codecs, I think we should 
> suggest that replace tries to collect 
> as much illegal data as possible before
> invoking the error handler. The handler 
> should be aware of the fact that it 
> won't necessarily get all the broken 
> data in one call.

OK for encoders, for decoders see
above.

> About the codec error handling 
> registry: You seem to be using a 
> Unicode specific approach here. 
> I'd rather like to see a generic 
> approach which uses the API 
> we discussed earlier. Would that be possible?

The handlers in the registry are all Unicode
specific. and they are different for encoding
and for decoding.

I renamed the function because of your
comment from 2001-06-13 10:05 (which 
becomes exceedingly difficult to find on
this long page! ;)).

> In that case, the codec API should 
> probably be called 
> codecs.register_error('myhandler', myhandler).
> 
> Does that make sense ?

We could require that unique names
are used for custom handlers, but
for the standard handlers we do have
name collisions. To prevent them, we
could either remove them from the registry
and require that the codec implements
the error handling for those itself,
or we could to some fiddling, so that
u"üöä".encode("ascii", "replace")
becomes 
u"üöä".encode("ascii", "unicodeencodereplace")
behind the scenes.

But I think two unicode specific 
registries are much simpler to handle.

> BTW, the patch which uses the callback 
> registry does not seem to be available 
> on this SF page (the last patch still 
> converts the errors argument to a 
> PyObject, which shouldn't be needed
> anymore with the new approach). 
> Can you please upload your 
> latest version?

OK, I'll upload a preliminary version
tomorrow. PyUnicode_EncodeDecimal and
PyUnicode_TranslateCharmap are still
missing, but otherwise the patch seems
to be finished. All decoders work and
the encoders collect unencodable characters
and implement the handling of known
callback handler names themselves.

As PyUnicode_EncodeDecimal is only used
by the int, long, float, and complex constructors,
I'd love to get rid of the errors argument,
but for completeness sake, I'll implement
the callback functionality.

> Note that the highlighting codec 
> would make a nice example
> for the new feature.

This could be part of the codec callback test
script, which I've started to write. We could
kill two birds with one stone here:
1. Test the implementation.
2. Document and advocate what is 
   possible with the patch.

Another idea: we could have as an example
a decoding handler that relaxes the
UTF-8 minimal encoding restriction, e.g.

def relaxedutf8(enc, uni, startpos, endpos, reason, data):
   if uni[startpos:startpos+2] == u"\xc0\x80":
      return (u"\x00", startpos+2)
   else:
      raise UnicodeError(...)


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-04-17 21:40

Message:
Logged In: YES 
user_id=38388

Sorry for the late response.

About the difference between encoding and decoding: you shouldn't
just look at the case where you work with Unicode and strings, e.g.
take the rot-13 codec which works on strings only or other codecs
which translate objects into strings and vice-versa.

Error handling has to be flexible enough to handle all these 
situations. Since the codecs know best how to handle the situations,
I'd make this an implementation detail of the codec and leave the
behaviour undefined in the general case.

For the existing codecs, backward compatibility should be 
maintained, if at all possible. If the patch gets overly complicated
because of this, we may have to provide a downgrade solution
for this particular problem (I don't think replace is used in any
computational context, though, since you can never be sure
how many replacement character do get inserted, so the case
may not be that realistic).

Raising an exception for the charmap codec is the right
way to go, IMHO. I would consider the current behaviour
a bug.

For new codecs, I think we should suggest that replace
tries to collect as much illegal data as possible before
invoking the error handler. The handler should be aware
of the fact that it won't necessarily get all the broken data
in one call.

About the codec error handling registry:
You seem to be using a Unicode specific approach
here. I'd rather like to see a generic approach which uses
the API we discussed earlier. Would that be possible ?
In that case, the codec API should probably be called
codecs.register_error('myhandler', myhandler).

Does that make sense ?

BTW, the patch which uses the callback registry does not seem
to be available on this SF page (the last patch still converts
the errors argument to a PyObject, which shouldn't be needed
anymore with the new approach). Can you please upload your 
latest version ?

Note that the highlighting codec would make a nice example
for the new feature.

Thanks.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-04-17 12:21

Message:
Logged In: YES 
user_id=89016

Another note: the patch will change the meaning of charmap 
encoding slightly: currently "replace" will put a ? into 
the output, even if ? is not in the mapping, i.e. 
codecs.charmap_encode(u"c", "replace", {ord("a"): ord
("b")}) will return ('?', 1).

With the patch the above example will raise an exception.

Off course with the patch many more replace characters can 
appear, so it is vital that for the replacement string the 
mapping is done.

Is this semantic change OK? (I guess all of the existing 
codecs have a mapping ord("?")->ord("?"))


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:19

Message:
Logged In: YES 
user_id=89016

So this means that the encoder can collect illegal 
characters and pass it to the callback. "replace" will 
replace this with (end-start)*u"?".

Decoders don't collect all illegal byte sequences, but call 
the callback once for every byte sequence that has been 
found illegal and "replace" will replace it with u"?".

Does this make sense?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:06

Message:
Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\Ux\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 19:36

Message:
Logged In: YES 
user_id=38388

Hmm, whatever it takes to maintain backwards 
compatibility. Do you have an example ?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 18:31

Message:
Logged In: YES 
user_id=89016

What should replace do: Return u"?" or (end-start)*u"?"

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 16:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 14:21:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 06:21:29 -0700
Subject: [Patches] [ python-Patches-586999 ] error in example in smtplib.py
Message-ID: <E17Y529-0002aV-00@usw-sf-web4.sourceforge.net>

Patches item #586999, was opened at 2002-07-26 17:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586999&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Stepan Koltsov (yozh)
Assigned to: Nobody/Anonymous (nobody)
Summary: error in example in smtplib.py

Initial Comment:
I found this while looking for errors that can appear
if PEP 295 will be approved ;-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586999&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 17:23:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 09:23:15 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Y7s3-0001Zm-00@usw-sf-web1.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 15:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 16:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 17:30:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 09:30:32 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Y7z6-0001n6-00@usw-sf-web1.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 17:54:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 09:54:13 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Y8M1-0001NH-00@usw-sf-web3.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 10:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 11:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 11:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 11:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 18:52:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 10:52:52 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Y9Gm-00047x-00@usw-sf-web2.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 19:54:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 11:54:48 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YAEi-0003kq-00@usw-sf-web3.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 19:54:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 11:54:59 -0700
Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py
Message-ID: <E17YAEt-0000dP-00@usw-sf-web4.sourceforge.net>

Patches item #585913, was opened at 2002-07-24 15:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Copeland (oracle)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adds Galeon support to webbrowser.py

Initial Comment:
Simple context diff against current CVS tree to add
support for Galeon to webbrowser.py


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-26 20:54

Message:
Logged In: YES 
user_id=21627

How does this relate to

https://sourceforge.net/tracker/index.php?func=detail&aid=586437&group_id=5470&atid=305470

?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 19:55:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 11:55:07 -0700
Subject: [Patches] [ python-Patches-586437 ] galeon support in webbrowser
Message-ID: <E17YAF1-0000dp-00@usw-sf-web4.sourceforge.net>

Patches item #586437, was opened at 2002-07-25 14:05
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Supreet Sethi (supreet)
Assigned to: Nobody/Anonymous (nobody)
Summary: galeon support in webbrowser

Initial Comment:
adds galeon support to webbrowser.py 

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-26 20:55

Message:
Logged In: YES 
user_id=21627

How does this relate to

https://sourceforge.net/tracker/index.php?func=detail&aid=585913&group_id=5470&atid=305470

?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586437&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 20:14:53 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 12:14:53 -0700
Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux
Message-ID: <E17YAY9-00046E-00@usw-sf-web3.sourceforge.net>

Patches item #581705, was opened at 2002-07-15 16:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Rasjid Wilcox (rasjidw)
>Assigned to: Martin v. Löwis (loewis)
Summary: fix to pty.spawn error on Linux

Initial Comment:
I submitted a bug report, id 581698 called 'pty.spawn -
wrong error caught'.

System: RedHat Linux 7.3, using Python2.

About a year ago, the final 'except' statement was
changed to catch IOError rather than just error. 
However, at least on my system, the os.read call raises
an OSError, not an IOError.  Therefore, the wrong error
type is now caught.

Patch attached.

Rasjid.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 20:50:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 12:50:20 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YB6S-0004qQ-00@usw-sf-web3.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 10:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 14:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 11:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 11:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 11:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 20:56:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 12:56:44 -0700
Subject: [Patches] [ python-Patches-585913 ] Adds Galeon support to webbrowser.py
Message-ID: <E17YBCe-0006UH-00@usw-sf-web2.sourceforge.net>

Patches item #585913, was opened at 2002-07-24 08:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Copeland (oracle)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adds Galeon support to webbrowser.py

Initial Comment:
Simple context diff against current CVS tree to add
support for Galeon to webbrowser.py


----------------------------------------------------------------------

Comment By: Greg Copeland (oracle)
Date: 2002-07-26 14:56

Message:
Logged In: YES 
user_id=40173

Not really sure.  I assume it's just a second patch by
another author.

What can I say, day late and a dollar short.  ;)

Having looked at the other patch, it appears mine is a
little more well rounded/complete/feature rich, if only
slightly.  I invite you to take a look for your self.  I'm
also not sure what version of webbrowser.py the other patch
is against.  My patch is against the CVS version so it will
be a breeze to apply.

Enjoy!


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-26 13:54

Message:
Logged In: YES 
user_id=21627

How does this relate to

https://sourceforge.net/tracker/index.php?func=detail&aid=586437&group_id=5470&atid=305470

?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=585913&group_id=5470


From noreply@sourceforge.net  Fri Jul 26 21:38:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 13:38:13 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YBqn-0007Gs-00@usw-sf-web2.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 01:58:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 17:58:59 -0700
Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix)
Message-ID: <E17YFv9-0006Sk-00@usw-sf-web4.sourceforge.net>

Patches item #584245, was opened at 2002-07-20 12:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: get python to link on OSF1 (Dec Unix)

Initial Comment:
Attached is a patch to fix the linking of python
(makedev not found) on Dec OSF/1 Unix 5.1.  This patch
has also been tested on Linux (RedHat 7.2).

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-26 20:58

Message:
Logged In: YES 
user_id=33168

This patch uses AC_TRY_LINK instead of AC_TRY_RUN.  It tries
makedev according to Martin's suggestion.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-23 17:43

Message:
Logged In: YES 
user_id=21627

That patch doesn't really test whether defining 
OSF_SOURCE helps in getting makedev, does it? In 
particular, if makedev is not available at all, or requires a 
different define, the test will still conclude that 
OSF_SOURCE should be defined, right?

I think the sequence should be:
- is makedev already available?
- if not, is it with OSF_SOURCE defined?
- if not, arrange to exclude makedev from posixmodule.c

Also, is it necessary to run the test program? autoconf is 
always worried that cross-compilation would fail, since you 
cannot run tests (although it is reasonable to link test 
programs in a cross-compilation environment).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 02:04:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 18:04:47 -0700
Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS
Message-ID: <E17YG0l-0006X6-00@usw-sf-web4.sourceforge.net>

Patches item #577031, was opened at 2002-07-03 11:57
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Remove PyArg_Parse() and METH_OLDARGS

Initial Comment:
This patch removes more PyArg_Parse() and METH_OLDARGS
which are deprecated.
I've tested in select and string, but want to make sure
there's nothing else I'm missing.

I also have a huge change to glmodule, but I can't test
that.  The diff is attached.
Let me know if I should check in glmodule or leave it
alone.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-26 21:04

Message:
Logged In: YES 
user_id=33168

All the "s" / PyString_Check() changes are in fmmodule.  I
suggest to not patch fmmodule now.  Are all the other
changes ok?  Should I bother fixing glmodule at all?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 01:45

Message:
Logged In: YES 
user_id=21627

The changes look good, except for the ones that change
parsing of "s" to PyString_Check: that means to lose support
for Unicode.

For some of these methods, that may be acceptable, but that
would need documentation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 02:24:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 26 Jul 2002 18:24:10 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YGJW-0006p5-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 08:54:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 27 Jul 2002 00:54:02 -0700
Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux
Message-ID: <E17YMOo-0007Zi-00@usw-sf-web3.sourceforge.net>

Patches item #581705, was opened at 2002-07-16 00:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Rasjid Wilcox (rasjidw)
Assigned to: Martin v. Löwis (loewis)
Summary: fix to pty.spawn error on Linux

Initial Comment:
I submitted a bug report, id 581698 called 'pty.spawn -
wrong error caught'.

System: RedHat Linux 7.3, using Python2.

About a year ago, the final 'except' statement was
changed to catch IOError rather than just error. 
However, at least on my system, the os.read call raises
an OSError, not an IOError.  Therefore, the wrong error
type is now caught.

Patch attached.

Rasjid.


----------------------------------------------------------------------

>Comment By: Rasjid Wilcox (rasjidw)
Date: 2002-07-27 17:54

Message:
Logged In: YES 
user_id=39640

Actually, a bit more testing revealed some more errors when
the main process had its standard input and output something
other than a tty.

I attach my second version of the patch.

Rasjid.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 09:20:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 27 Jul 2002 01:20:17 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YMoD-0007sx-00@usw-sf-web3.sourceforge.net>

Patches item #587076, was opened at 2002-07-27 01:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 18:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 11:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 06:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-27 05:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 04:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 03:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-27 02:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 02:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-27 02:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 12:23:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 27 Jul 2002 04:23:05 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YPf7-0004P6-00@usw-sf-web1.sourceforge.net>

Patches item #587076, was opened at 2002-07-27 01:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 21:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 18:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 11:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 06:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-27 05:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 04:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 03:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-27 02:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 02:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-27 02:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sat Jul 27 23:23:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 27 Jul 2002 15:23:22 -0700
Subject: [Patches] [ python-Patches-544113 ] merging sorted sequences
Message-ID: <E17YZy6-0006Rh-00@usw-sf-web1.sourceforge.net>

Patches item #544113, was opened at 2002-04-15 07:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=544113&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: merging sorted sequences

Initial Comment:
This patch is intended to add to the bisect module a function witch permit to merge several sorted sequences into an ordered list.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-27 18:23

Message:
Logged In: YES 
user_id=6380

Thanks.

This doesn't strike me as a "fundamental" algorithm like
bisection or heap sort. I don't think I've ever needed this,
except perhaps in situations where the amount of data was
small enough that simply concatenating the lists and sorting
them was an acceptable 3-line solution.

Therefore I'm rejecting this unless you get someone of
importance to plead for it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=544113&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 10:43:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 02:43:33 -0700
Subject: [Patches] [ python-Patches-581705 ] fix to pty.spawn error on Linux
Message-ID: <E17YkaL-0006n9-00@usw-sf-web1.sourceforge.net>

Patches item #581705, was opened at 2002-07-15 16:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Rasjid Wilcox (rasjidw)
Assigned to: Martin v. Löwis (loewis)
Summary: fix to pty.spawn error on Linux

Initial Comment:
I submitted a bug report, id 581698 called 'pty.spawn -
wrong error caught'.

System: RedHat Linux 7.3, using Python2.

About a year ago, the final 'except' statement was
changed to catch IOError rather than just error. 
However, at least on my system, the os.read call raises
an OSError, not an IOError.  Therefore, the wrong error
type is now caught.

Patch attached.

Rasjid.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 11:43

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed (in a slightly modified
form) as pty.py 1.12.

----------------------------------------------------------------------

Comment By: Rasjid Wilcox (rasjidw)
Date: 2002-07-27 09:54

Message:
Logged In: YES 
user_id=39640

Actually, a bit more testing revealed some more errors when
the main process had its standard input and output something
other than a tty.

I attach my second version of the patch.

Rasjid.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581705&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 10:58:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 02:58:18 -0700
Subject: [Patches] [ python-Patches-575827 ] SSL release GIL
Message-ID: <E17Ykoc-0004sG-00@usw-sf-web4.sourceforge.net>

Patches item #575827, was opened at 2002-07-01 07:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Martin v. Löwis (loewis)
Summary: SSL release GIL

Initial Comment:
This is more or less a rewrite of parts of patch
#475045. It releases the GIL during the SSL operations
for opening a SSL socket. Currently the GIL is only
released during the read and write operations to a SSL
socket.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 11:58

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, applied as _ssl.c 1.7.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-07-01 07:15

Message:
Logged In: YES 
user_id=163326

Randomly assigning to Martin, who proofread my previous patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=575827&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:02:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:02:44 -0700
Subject: [Patches] [ python-Patches-554807 ] Add _winreg support for Cygwin
Message-ID: <E17Yksu-0004wD-00@usw-sf-web4.sourceforge.net>

Patches item #554807, was opened at 2002-05-11 14:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470

Category: Windows
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add _winreg support for Cygwin

Initial Comment:
This adds _winreg support to Cygwin Python without 
dependencies on other Windows modules.

For platforms in which MS_WINDOWS isn't defined, this 
reports the OSError exception instead of WindowsErr. 
It also uses the non-MBCS versions of registry access 
in this case.

Some minor changes to _winreg.c were made to clean up 
compiler warnings from GCC.

setup.py was changed to create a dynamic _winreg 
module under cygwin. There are also some earlier 
changes in the patch file to skip the import test (due 
to Cygwin fork issues), and to require libintl when 
building _locale under Cygwin.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:02

Message:
Logged In: YES 
user_id=21627

Is any kind of tweaking forthcoming?

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-05-15 15:30

Message:
Logged In: YES 
user_id=329402

It sounds like the patches need some tweaking (my testing 
had passed but was certainly limited).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-15 14:57

Message:
Logged In: YES 
user_id=21627

Yes, but you are wrong assuming that the *A functions expect
Latin-1. Instead, they expect char* encoded as CP_ACP, which
is known as "mbcs" in Python.

The *W functions do *not* expect multi-byte strings, but
Unicode strings.

Notice that _winreg also calls the *A functions, even in
MSVC builds.

So I think converting Unicode to Latin-1 is definitely
incorrect.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-05-15 14:48

Message:
Logged In: YES 
user_id=329402

Windows supplies two versions of the relevant functions. 
The Cygwin version (at least as built) uses the ANSI 
versions, as indicated by the A at the end of the symbol 
names:
  $ nm _winreg.o | grep RegQueryValue
           U _RegQueryValueA@16
           U _RegQueryValueExA@24

As opposed to the "Windows Unicode/wide-char" functions, 
which end in W and require MBCS functions to decode.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-15 00:23

Message:
Logged In: YES 
user_id=21627

Can you please explain why not using MBCS is the right thing?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:03:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:03:42 -0700
Subject: [Patches] [ python-Patches-554718 ] OpenBSD updates for build process
Message-ID: <E17Yktq-0004x9-00@usw-sf-web4.sourceforge.net>

Patches item #554718, was opened at 2002-05-11 02:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554718&group_id=5470

Category: Build
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Matt Behrens (mattbehrens)
Assigned to: Nobody/Anonymous (nobody)
Summary: OpenBSD updates for build process

Initial Comment:
The following patches are currently in our packaging 
system.  A brief summary:

- Use 'cc -shared' to build shared libraries, as is 
strictly correct on OpenBSD.

- Use -fPIC instead of -fpic.

- Use OpenBSD threads.

- Fix the test_fcntl test.

Another patch item will be posted shortly for Python 
2.2, for similar items.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:03

Message:
Logged In: YES 
user_id=21627

Are you still interested in this patch? If so, what are the
answers to these questions?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-21 11:25

Message:
Logged In: YES 
user_id=21627

What architectures have been using ELF before OpenBSD 2.8?
I'd still like to simplify this logic, perhaps by removing
support for systems that nobody uses anymore.

As for -pthread: the test for OpenBSD specifically should
go. Instead, I propose to integrate this with the -Kpthread
logic: there should be a sequence of options tested, and the
first one shown to enable pthreads should be used. The set
of options should be -Kpthread (for SysV), -pthread (for BSD
and Linux), -pthreads (for gcc on Solaris).

I'd be willing to accept a test-for-system for 2.1, since it
does not have the -Kpthread test, but for 2.2 and 2.3, we
should remove the set of tests used.

Also, why does it AC_DEFINE _REENTRANT and _POSIX_THREADS?
Those two should be implied by -pthread.

Also, what OpenBSD releases could be deprecated without
losing users?

----------------------------------------------------------------------

Comment By: Matt Behrens (mattbehrens)
Date: 2002-05-21 00:47

Message:
Logged In: YES 
user_id=240525

>From brad@:

> There isn't a test for -pthread option so Python will not
correctly
> compile with threads support. Testing for libc_r is NOT
correct.

So, the answer is no, the standard POSIX threads test does
not work.


----------------------------------------------------------------------

Comment By: Matt Behrens (mattbehrens)
Date: 2002-05-20 14:17

Message:
Logged In: YES 
user_id=240525

Okay, well let's comment in this bug then.  Changing the
subject and closing out 554719.  I'll put all patches on
this bug.

I am trying to verify most of this with brad@openbsd.org,
who has contributed some parts of these patches.

On cc -shared, this is my understanding:

-  All OpenBSD ELF architectures have always used cc -shared.

-  Before OpenBSD 2.8, a.out architectures used ld -Bshareable.

-  As of OpenBSD 2.8, cc -shared worked on a.out
architectures as well, and ld -Bshareable became deprecated.
On -fPIC: -fPIC has always worked.  The difference between
-fpic and -fPIC is simply that -fpic is less efficient.

On threads, I am still waiting for an answer from brad@,
this is his change.  I'll ask him again today.

Thanks.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-12 18:49

Message:
Logged In: YES 
user_id=21627

The -shared chunk looks frightening. What is the first BSD
release where ld -Bshareable stops working? Could you
rearrange this to integrate the version numbers into the
OpenBSD* match? Also, what releases need the ELF test? Could
that be restricted to the older releases, too?

Would it be acceptable to stop supporting OpenBSD 0 and 1?

Is usage of -fPIC correct on OpenBSD 0.x? If not, what is
the first release that supports -fPIC?

It looks like that 'OpenBSD threads' are 'POSIX threads'?
Why does the existing test for Posix threads fail to detect
their presence?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554718&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:24:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:24:12 -0700
Subject: [Patches] [ python-Patches-554716 ] __va_copy patches
Message-ID: <E17YlDg-0005Go-00@usw-sf-web4.sourceforge.net>

Patches item #554716, was opened at 2002-05-11 02:08
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554716&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Matt Behrens (mattbehrens)
Assigned to: Nobody/Anonymous (nobody)
Summary: __va_copy patches

Initial Comment:
This issue was discovered when preparing for OpenBSD 
3.1, and compiling on our non-i386 arches.  Let me 
quote a mail from drahn@openbsd.org:

> [Tell the Python guys] the vararg handling is poor, 
and that this is possible solution, but not a great 
solution. If possible It would be best to not parse 
the varargs argument twice.

> Different architectures deal with varargs 
differently, __va_copy is a way that some 
architectures use do a deep copy. __va_copy is present 
in solaris and powerpc (*BSD and Linux) as far as I 
know.

Attached is the patches we are using to build our 
Python package; without it we cannot build Python 2.2 
on arches like powerpc as the built python cores.

Python 2.1 does not need these patches.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:24

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, applied as

abstract.c 2.93.6.5
stringobject.c 2.147.6.6
getargs.c 2.90.6.1
modsupport.c 2.58.16.2

abstract.c 2.104
stringobject.c 2.171
getargs.c 2.93
modsupport.c 2.61


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554716&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:30:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:30:51 -0700
Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type
Message-ID: <E17YlK7-0005Mt-00@usw-sf-web4.sourceforge.net>

Patches item #554192, was opened at 2002-05-09 19:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: mimetypes: all extensions for a type

Initial Comment:
This patch adds a function guess_all_extensions to 
mimetypes.py. This function returns all known 
extensions for a given type, not just the first one 
found in the types_map dictionary. guess_extension is 
still present and returns the first from the list.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:30

Message:
Logged In: YES 
user_id=21627

What is the role of add_type in this patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:34:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:34:49 -0700
Subject: [Patches] [ python-Patches-552812 ] Better description in "python -h" for -u
Message-ID: <E17YlNx-0005Ql-00@usw-sf-web4.sourceforge.net>

Patches item #552812, was opened at 2002-05-06 11:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552812&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Sean Reifschneider (jafo)
Assigned to: Nobody/Anonymous (nobody)
>Summary: Better description in "python -h" for -u

Initial Comment:
A new user was confused by the fact that "python -u"
in combination with "sys.stdin.xreadlines()" was not
doing what he expects.  I believe that this
modification makes it a bit more clear that there is
internal buffering which "-u" does not influence.

Also included is a man-page modification of similar
nature (though more detailed).

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:34

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, committed as

python.man 1.25
main.c 1.65


----------------------------------------------------------------------

Comment By: Sean Reifschneider (jafo)
Date: 2002-05-08 12:26

Message:
Logged In: YES 
user_id=81797

Ok, I've converted it to a single line note referencing the
man-page.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-08 11:17

Message:
Logged In: YES 
user_id=21627

I dislike the change to add many new lines to the -h output.
Can you squeeze this into one less line, e.g. by referring
to the documentation?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=552812&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:36:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:36:38 -0700
Subject: [Patches] [ python-Patches-550192 ] Set softspace to 0 in raw_input()
Message-ID: <E17YlPi-0005SF-00@usw-sf-web4.sourceforge.net>

Patches item #550192, was opened at 2002-04-29 16:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550192&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
>Assigned to: Martin v. Löwis (loewis)
Summary: Set softspace to 0 in raw_input()

Initial Comment:
Setting softspace to 0 in raw_input() makes it 
behave as expected when a "print 'something'," 
precedes the raw_input() call, with or without a 
prompt argument. 

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-05-03 21:07

Message:
Logged In: YES 
user_id=7887

Ok.. now it outputs an extra space if softspace was true, 
as expected after a "print 'something',". 
 
Thanks again. 

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-05-03 20:45

Message:
Logged In: YES 
user_id=7887

Please, don't apply it yet. I'm testing some aspects of 
the patch. 

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-05-03 19:53

Message:
Logged In: YES 
user_id=7887

Sure! Here's a fixed patch including those cleanups. 
 
Thank you! 

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-03 08:04

Message:
Logged In: YES 
user_id=21627

The checking logic for a lost stdout appears to be broken:
it should already check for an exception right when
verifying whether stdout isatty. Can you incorporate such
cleanup in your patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=550192&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:50:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:50:38 -0700
Subject: [Patches] [ python-Patches-543498 ] s/Copyright/License/ in bdist_rpm.py
Message-ID: <E17YldG-0005dh-00@usw-sf-web4.sourceforge.net>

Patches item #543498, was opened at 2002-04-14 00:07
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=543498&group_id=5470

Category: Distutils and setup.py
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Nobody/Anonymous (nobody)
Summary: s/Copyright/License/ in bdist_rpm.py

Initial Comment:
The "Copyright" field in RPM spec files is obsolete. 
"License" should be used instead. 

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:50

Message:
Logged In: YES 
user_id=21627

It appears that you need rpm 3.x, which was release 1999. I
think this is safe enough to accept this patch; applied as
bdist_rpm.py 1.30.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-18 08:06

Message:
Logged In: YES 
user_id=21627

So what is the minimum version of the RPM software that
accepts the License: field? It is my understanding that
rpm(1) may blow up if it does not recognize a field.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-04-14 18:46

Message:
Logged In: YES 
user_id=7887

The rpm.org site is much more obsolete than this tag  
<wink>.  
  
Here is an excerpt from a message of Jeff Johnson in  
rpm-list (subject is "Re: three questions about building  
rpms"):  
  
----  
[...] 
This is historical legacy. Originally rpm had  
        Copyright: GPL  
but everyone said  
        GPL is not a copyright.  
  
So, rpm changed the tag name to License:, and, for  
backward compatibility, used the same numeric value as  
RPMTAG_COPYRIGHT. Now, everyone gets to ask the next  
question  
  
        Which is it Copyright: or License:?  
  
and the answer is <shrug> :-)  
----  
  
Every distribution working with rpms, including redhat,  
has changed (or is changing) the tag to License.  
Copyright, as Jeff said by himself, is a misgiven name  
for that field.  
 

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-14 11:58

Message:
Logged In: YES 
user_id=21627

Can you provide a pointer that shows this obsoletion?

http://www.rpm.org/RPM-HOWTO/build.html#SPEC-FILE

still says Copyright.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=543498&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:52:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:52:11 -0700
Subject: [Patches] [ python-Patches-470607 ] HTML version of the Idle "documentation"
Message-ID: <E17Ylel-0005ec-00@usw-sf-web4.sourceforge.net>

Patches item #470607, was opened at 2001-10-12 17:13
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=470607&group_id=5470

Category: IDLE
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Internet Discovery (idiscovery)
Assigned to: Nobody/Anonymous (nobody)
>Summary: HTML version of the Idle "documentation"

Initial Comment:
<HTML>
<HEAD>
<TITLE>Idle Help</TITLE>
</HEAD>

<BODY>

<H2>Features</H2>

IDLE has the following features:
<UL>
<LI>	coded in 100% pure Python, using the Tkinter GUI toolkit (i.e. Tcl/Tk)

<LI>	cross-platform: works on Windows and Unix (on the Mac, there are
currently problems with Tcl/Tk)

<LI>	multi-window text editor with multiple undo, Python colorizing
and many other features, e.g. smart indent and call tips

<LI>	Python shell window (a.k.a. interactive interpreter)

<LI>	debugger (not complete, but you can set breakpoints, view  and step)
</UL>

<H2> Menus</H2>

<H3>File menu:</H3>

<DL>
<DT>New window</DT><DD>create a new editing window
<DT>Open...</DT><DD>open an existing file
<DT>Open module...</DT><DD>open an existing module (searches sys.path)
<DT>Class browser</DT><DD>show classes and methods in current file
<DT>Path browser</DT><DD>show sys.path directories, modules, classes
		and methods
<HR>
<DT>Save</DT><DD>save current window to the associated file (unsaved
		windows have a * before and after the window title)

<DT>Save As...</DT><DD>save current window to new file, which becomes
		the associated file
<DT>Save Copy As...</DT><DD>save current window to different file
		without changing the associated file
<HR>
<DT>Close</DT><DD>close current window (asks to save if unsaved)
<DT>Exit</DT><DD>close all windows and quit IDLE (asks to save if unsaved)
</DL>

<H3>Edit menu:</H3>

<DL>
<DT>Undo</DT><DD>Undo last change to current window (max 1000 changes)
<DT>Redo</DT><DD>Redo last undone change to current window
<HR>
<DT>Cut</DT><DD>Copy selection into system-wide clipboard; then delete selection
<DT>Copy</DT><DD>Copy selection into system-wide clipboard
<DT>Paste</DT><DD>Insert system-wide clipboard into window
<DT>Select All</DT><DD>Select the entire contents of the edit buffer
<HR>
<DT>Find...</DT><DD>Open a search dialog box with many options
<DT>Find again</DT><DD>Repeat last search
<DT>Find selection</DT><DD>Search for the string in the selection
<DT>Find in Files...</DT><DD>Open a search dialog box for searching files
<DT>Replace...</DT><DD>Open a search-and-replace dialog box
<DT>Go to line</DT><DD>Ask for a line number and show that line
<HR>
<DT>Indent region</DT><DD>Shift selected lines right 4 spaces
<DT>Dedent region</DT><DD>Shift selected lines left 4 spaces
<DT>Comment out region</DT><DD>Insert ## in front of selected lines
<DT>Uncomment region</DT><DD>Remove leading # or ## from selected lines
<DT>Tabify region</DT><DD>Turns <EMPH>leading</EMPH> stretches of spaces into tabs
<DT>Untabify region</DT><DD>Turn <EMPH>all</EMPH> tabs into the right number of spaces
<DT>Expand word</DT><DD>Expand the word you have typed to match another
		word in the same buffer; repeat to get a different expansion
<DT>Format Paragraph</DT><DD>Reformat the current blank-line-separated paragraph
<HR>
<DT>Import module</DT><DD>Import or reload the current module
<DT>Run script</DT><DD>Execute the current file in the __main__ namespace
</DL>

<H3>Windows menu:</H3>

<DL>
<DT>Zoom Height</DT><DD>toggles the window between normal size (24x80)
	and maximum height.
<HR>
</DL>

The rest of this menu lists the names of all open windows; select one
	to bring it to the foreground (deiconifying it if necessary).

<H3>Debug menu (in the Python Shell window only):</H3>

<DL>
<DT>Go to file/line</DT><DD>look around the insert point for a filename
		and linenumber, open the file, and show the line.
<DT>Open stack viewer</DT><DD>show the stack traceback of the last exception
<DT>Debugger toggle</DT><DD>Run commands in the shell under the debugger
<DT>JIT Stack viewer toggle</DT><DD>Open stack viewer on traceback
</DL>

<H2>Basic editing and navigation:</H2>

<UL>
<LI>	Backspace deletes to the left; DEL deletes to the right
<LI>	Arrow keys and Page Up/Down to move around
<LI>	Home/End go to begin/end of line
<LI>	Control-Home/End go to begin/end of file
<LI>	Some Emacs bindings may also work, e.g. ^B/^P/^A/^E/^D/^L
</UL>

<H3>Automatic indentation:</H3>

After a block-opening statement, the next line is indented by 4 spaces
(in the Python Shell window by one tab).  After certain keywords
(break, return etc.) the next line is dedented.  In leading
indentation, Backspace deletes up to 4 spaces if they are there.  Tab
inserts 1-4 spaces (in the Python Shell window one tab).  See also the
indent/dedent region commands in the edit menu.

<H3>Python Shell window:</H3>

<UL>
<LI>	^C interrupts executing command
<LI>	^D sends end-of-file; closes window if typed at >>> prompt
</UL>

<H4>Command history:</H4>

<UL>
<LI>	Alt-p retrieves previous command matching what you have typed
<LI>	Alt-n retrieves next
<LI>	Return while on any previous command retrieves that command
<LI>	Alt-/ (Expand word) is also useful here
</UL>

<H4>Syntax colors:</H4>

The coloring is applied in a background "thread", so you may
occasionally see uncolorized text.  To change the color scheme, edit
the [Colors] section in config.txt.

<DL>
<DT>Python syntax colors:
<DL>
<DT>	Keywords:</DT>	orange
<DT>	Strings	:</DT>	green
<DT>	Comments:</DT>	red
<DT>	Definitions:</DT>	blue
</DL>
<DT>Shell colors:
<DL>
<DT>	Console output:</DT>	brown
<DT>	stdout:</DT>		blue
<DT>	stderr:</DT>	dark green
<DT>	stdin:</DT>	black
</DL>
</DL>

</H3>Command line usage:</H3>
<PRE>
	idle.py [-c command] [-d] [-e] [-s] [-t title] [arg] ...

	-c command  run this command
	-d          enable debugger
	-e          edit mode; arguments are files to be edited
	-s          run $IDLESTARTUP or $PYTHONSTARTUP first
	-t title    set title of shell window
</PRE>
<P>
If there are arguments:
<OL>
<LI>	    If -e is used, arguments are files opened for editing and
	    sys.argv reflects the arguments passed to IDLE itself.
<LI>
	    Otherwise, if -c is used, all arguments are placed in
	    sys.argv[1:...], with sys.argv[0] set to '-c'.
<LI>
	    Otherwise, if neither -e nor -c is used, the first
	    argument is a script which is executed with the remaining
	    arguments in sys.argv[1:...]  and sys.argv[0] set to the
	    script name.  If the script name is '-', no script is
	    executed but an interactive Python session is started; the
	    arguments are still available in sys.argv.
</OL>
</BODY>
</HTML>


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:52

Message:
Logged In: YES 
user_id=21627

Since no further explanation was forthcoming, I reject this
patch.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-11-11 22:02

Message:
Logged In: NO 

hello

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-12 17:54

Message:
Logged In: YES 
user_id=6380

What do you want us to do with this?

Note that IDLE development is going on in the
idlefork.sf.net project. You might want to submit it there.
And please use the file upload feature.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=470607&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 11:54:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 03:54:04 -0700
Subject: [Patches] [ python-Patches-492105 ] Import from Zip archive
Message-ID: <E17Ylga-0005fw-00@usw-sf-web4.sourceforge.net>

Patches item #492105, was opened at 2001-12-12 18:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: James C. Ahlstrom (ahlstromjc)
Assigned to: Nobody/Anonymous (nobody)
Summary: Import from Zip archive

Initial Comment:
This is the "final" patch to support imports from zip 
archives, and directory caching using os.listdir(). It 
replaces patch 483466 and 476047.  It is a separate 
patch since I can't delete file attachments.  It adds 
support for importing from "" and from relative paths.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:54

Message:
Logged In: YES 
user_id=21627

Is this patch ready to be applied?

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-12 17:05

Message:
Logged In: YES 
user_id=31392

Deleteing the old diffs that Jim couldn't delete.


----------------------------------------------------------------------

Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2002-03-15 18:27

Message:
Logged In: YES 
user_id=64929

I added a diff -c version of the patch.

----------------------------------------------------------------------

Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2002-03-15 18:03

Message:
Logged In: YES 
user_id=64929

I still can't delete files, but I added a new file which
contains all diffs as a single file, and is made from the
current CVS tree (Mar 15, 2002).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:00:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:00:06 -0700
Subject: [Patches] [ python-Patches-452232 ] timestamp function for time module
Message-ID: <E17YlmQ-0005kQ-00@usw-sf-web4.sourceforge.net>

Patches item #452232, was opened at 2001-08-17 23:37
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Gareth Harris (garethharris)
Assigned to: Nobody/Anonymous (nobody)
Summary: timestamp function for time module

Initial Comment:
Timestamp creates timestamp strings 
in ISO or ODBC format in UTC or local timezones. 
It can also add microseconds where needed. 
Timestamps are often needed 
outside database or XML activities, 
so its proposed location is the time module.

timestamp(secs=None,fmt='ISO',TZ=None,fracsec=None):
    '''Make ISO or ODBC timestamp from [current] time.
    Parameters:
    secs= float seconds, else default = time()
    fmt = 'ISO' use ISO 8601 standard format = 
            "YYYY-MM-DDTHH:MM:SS.mmmmmmZ"       Zulu or
            "YYYY-MM-DDTHH:MM:SS.mmmmmm-hh:mm"  local
      else  "YYYY-MM-DD HH:MM:SS.mmmmmm"        ODBC
    TZ      = None=GMT/UTC/Zulu, else local time zone
    fracsec = None, else add microseconds to string
    '''

Any improvement or standardization is welcome.

Gareth Harris
gharris@nrao.edu
2001-08-17T21:36:00Z


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:00

Message:
Logged In: YES 
user_id=21627

Since no actual patch is forthcoming, I'm rejecting this.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:40

Message:
Logged In: YES 
user_id=21627

If you want to see the code included, you'd need to provide
a  context diff, including docs and test cases.

However, notice that there may be overlap with the emerging
builtin DateTime type, see

http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage

----------------------------------------------------------------------

Comment By: Gareth Harris (garethharris)
Date: 2002-01-02 17:41

Message:
Logged In: YES 
user_id=300900

Back from travel, other projects etc. [2001.01.02]
Thanks for comments thus far.
Maybe I will finally meet some of you in Feb.
---
I proposed to put this in TIME module 
UNLESS someone has an idea for a better location.
Who takes care of that module?
Shall I provide: doc?, test suite?
Is a companion decode function needed?
OTHERWISE I will put it in sourceforge/activestate?
Which is preferred?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-01 21:27

Message:
Logged In: YES 
user_id=21627

Gareth,

Can you please propose a strategy to advance this patch or
withdraw it? If there is no action, I propose to close it by
Feb 1, 2002.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-12-06 15:57

Message:
Logged In: YES 
user_id=3066

Another possible alternate home for this would be the Python
Snippet repository on SourceForge:

http://sourceforge.net/snippet/browse.php?by=lang&lang=6

I'm not suggesting that this doesn't belong in the standard
library, however.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 19:46

Message:
Logged In: YES 
user_id=21627

Nice patch. If you want to see this included, you should 
complete it: Decide on location of the function, provide 
documentation and test cases. As the location, it may be 
that the calendar module could provide a home, but you may 
ask in the newsgroup.

If you merely wanted to publish this code snippet, I 
suggest that you find a better home than the Python patch 
database, e.g. the Cookbook:

http://aspn.activestate.com/ASPN/Cookbook/Python

There are a number of other places that collect Python 
snippets; this is just one option.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:07:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:07:26 -0700
Subject: [Patches] [ python-Patches-458898 ] --python-build for install
Message-ID: <E17YltW-0005sb-00@usw-sf-web4.sourceforge.net>

Patches item #458898, was opened at 2001-09-05 22:58
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=458898&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
>Resolution: Out of Date
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Michael Hudson (mwh)
Summary: --python-build for install

Initial Comment:
Sometimes, being able to install python tools without
having python installed is desirable. When building an
RPM package of python, for example, one may want to
build/install IDLE as well, including it in a
subpackage. Indeed, we're doing this with a couple of
python tools here at Conectiva. Unfortunately, we have
a egg-chicken problem when doing this. You need python
installed in your system before you install tools. This
limitation may be observed in the file
Lib/distutils/sysconfig.py. It looks for Makefile in
the final installation directory, for example.

This patch adds a new option to dist-utils' install
command: --python-build. When used, python will look
for these files in the python build directory specified
trough the option.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:07

Message:
Logged In: YES 
user_id=21627

It appears that the patch is outdated; set_python_build is
no longer.
Is the patch still needed?

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-01-31 15:18

Message:
Logged In: YES 
user_id=7887

About..

1) Sorry.. I'll take care to add comments to the file next 
time. The bottom one is newer.

2) For now, a local option seems to be ok. If other 
commands start using it (what seems unprobable right now), 
we may turn it into a global option without any drawbacks, 
since global options are acceptable anywhere.


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-21 11:45

Message:
Logged In: YES 
user_id=6656

Ta.  Some random comments:
(1) it's not obvious from this page which of the two patches
attached is the newer.  This may be sf's fault, but...
(2) might it be better to make this a global distutils
option?  It seems a bit fragile at the moment -- we'd need
to change things if, say, build_ext started to depend on
python_build.
Would, say
$ python setup.py --python-build install
be better?

I dunno, I don't really understand how options chase around
distutils yet... 

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-01-19 01:02

Message:
Logged In: YES 
user_id=7887

Here is a new patch including your suggestions.

Thank you!!


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-17 18:49

Message:
Logged In: YES 
user_id=6656

Hey!  This patch is less than six months old.  Virtually
fresh :|

Some comments:  are you sure you can get away with only
honouring --python-build in install?  I think build_scripts
needs it too (now, anyway, maybe not when you wrote the patch).

Also, the mod to install.finalize_options() is in the wrong
place wrt. the surrouding comments.  Can you fix this?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=458898&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:20:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:20:34 -0700
Subject: [Patches] [ python-Patches-462754 ] no '_d' ending for mingw32
Message-ID: <E17Ym6E-00062Y-00@usw-sf-web4.sourceforge.net>

Patches item #462754, was opened at 2001-09-19 05:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470

Category: Distutils and setup.py
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Nobody/Anonymous (nobody)
Summary: no '_d' ending for mingw32

Initial Comment:
This patch prevents distutils from naming the extension
modules <extname>_d.pyd when compiled with mingw32 on
Windows in debug mode. Instead, the extension modules
will get the normal name <extname>.pyd. Technically,
the patch doesn't prevent the behaviour for mingw32,
but only adds the _d for MS Visual C++ and Borland
compilers (though I don't know about the Borland case).

The reason for this? Adding "_d" doesn't make any sense
for GNU compilers. I think it's just a MS Visual C++
madness. If you want to debug an extension module that
was compiled with gcc, you have to use gdb anyway,
because the debugging symbols of MSVC++ and gcc are
incompatible. So you normally use a release Python
version (from the python.org binary download) and
compile your extensions with mingw32.

To put it shortly:

The current state is that you do a
"setup.py build --compiler=mingw32 --debug" and then
rename the extension modules, removing the _d. Then
fire up gdb to debug your module.

With this patch, the renaming isn't necessary anymore.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:20

Message:
Logged In: YES 
user_id=21627

This patch is wrong: Whether or not _d should be added to
the module name depends on whether or not Py_DEBUG is defined;
this is independent on whether --debug was given, atleast
for Cygwin (for MSVC, --debug will define _DEBUG which will
define Py_DEBUG).

So the current distutils is wrong (since it always adds _d),
but the patch doesn't make it better (since it never adds
_d). Rejecting the patch.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-04-17 07:07

Message:
Logged In: YES 
user_id=163326

If python.exe is compiled --with-pydebug, then this is true.

But the point is that I want to compile debug versions of my
extension modules and use them with the standard python.exe
(*not* python_d.exe).

So yes, the patch does work, at least it did when I
submitted it <wink>.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:44

Message:
Logged In: YES 
user_id=21627

Does the patch actually work? It seems to me that, if
compiled with-pydebug, import will automatically search for
the _d version, and complain if it is not found.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-04 12:52

Message:
Logged In: YES 
user_id=21627

The rationale for using the debugging version of MSVCRT are
not the debugging information alone, but also the additional
functionalities, like heap consistency checks and other
assertions. So it is not obvious that you do not want to use
the debugging version of this library in a debug build.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-01-04 03:50

Message:
Logged In: YES 
user_id=163326

mingw links with msvcrt.dll. I've plans to add mingw32
support to the autoconf build process (hopefully soon enough
for 2.3).

The GNU and MS debugger symbols are incompatible, though, so
I think that mingw32 shouldn't link to the debug version of
msrcrt (gdb doesn't understand the Microsoft debugger
symbols; and the Visual Studio debugger has no idea what the
debugging symbols of gcc are all about; isn't cross-platform
and cross-compiler programming fun?).


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-30 14:13

Message:
Logged In: YES 
user_id=21627

How does the mingw port interact with the debugging
libraries? With MSVC, the debug build will link to the debug
versions of the CRT. What C library will mingw link with (I
hope it won't use crtdll.dll)?

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-28 23:28

Message:
Logged In: YES 
user_id=163326

Yes. But mingw32 isn't emulating Unix under Windows (that
would be Cygwin). It's just a version of gcc and friends
that targets native win32. It links against msvcrt (not a
Posix emulation library like Cygwin does).

This is a bit hypothetical because I didn't yet hack the
autoconf build process for native win32 with mingw32.

Currently, you cannot build a complete Python with mingw32,
but you *can* build extension modules against an existing
Python (compiled with M$ VC++).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-09-28 22:43

Message:
Logged In: YES 
user_id=31435

All else being equal, a system emulating Unix under Windows 
should strive to make life comfortable for Unix folks.  The 
question is thus whether all else is in fact equal <wink -- 
but I don't know, as I don't yet use the system under 
discussion>.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-28 20:37

Message:
Logged In: YES 
user_id=163326

Hmm. I don't like the _d endings at all. But if the policy
on win32 is that debug executables and libraries get a "_d"
ending, then I'm unsure wether this patch should be applied.

I have plans to hack the autoconf madness to build a native
win32 Python with mingw32. But that won't be ready by
tomorror. And I don't think that I'll add "_d" endings there
for debugging, because that would be inconsistent with the
normal autoconf builds on Unix.

I'm glad that *I* don't have to decide wether this patch is
a Good Thing. Being consistent with Python win32 build or
with GNU (gcc/autoconf). Take your pick :-)


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-09-19 05:46

Message:
Logged In: YES 
user_id=31435

FYI, MSVC never adds _d on its own -- Mark Hammond and/or 
Guido forced it to do that.  I don't remember why, but one 
of them explained it to me long ago and it made good sense 
at the time <wink>.

MSCV normally compiles debug and release builds into 
distinct subdirectories, and uses the same names in both.  
But *our* MSVC setup forces it to compile both flavors of 
build directly into the PCbuild directory, so has to give 
the resulting DLLs and executables different names (else 
the second build would overwrite the results of the first 
build).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:29:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:29:27 -0700
Subject: [Patches] [ python-Patches-459381 ] Unambiguous import for encodings
Message-ID: <E17YmEp-0001hZ-00@usw-sf-web1.sourceforge.net>

Patches item #459381, was opened at 2001-09-07 03:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=459381&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: Mikhail Zabaluev (mzabaluev)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unambiguous import for encodings

Initial Comment:
The __import__ call in encodings/__init__.py does not
specify module hierarchy explicitly. This results in
misleading error tracebacks (try
"codecs.lookup('codecs')"). Worse, it results in an
error when one is trying to lookup a codec and the
encoding's name fires some top-level module, e.g
'base64', despite that a codec for this encoding may
actually be registered in the system.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:29

Message:
Logged In: YES 
user_id=21627

It appears that this patch has been superceded by #571603;
closing this one as a duplicate.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-19 11:01

Message:
Logged In: YES 
user_id=21627

I'm in favour of integrating this patch, even though it
means that some codecs that are currently found won't be
found anymore. Authors of such codecs would need to register
a search function.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=459381&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:33:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:33:37 -0700
Subject: [Patches] [ python-Patches-571603 ] Fix bug in encodings.search_function
Message-ID: <E17YmIr-00081Q-00@usw-sf-web4.sourceforge.net>

Patches item #571603, was opened at 2002-06-20 13:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix bug in encodings.search_function

Initial Comment:
Hi,

there seems to be a bug in the default encoding search 
function (search_function in encodings/__init__.py. The 
function tries to load a module with the name of the 
encoding, but it doesn't require that this module is in the 
encodings/ directory. This leads to trouble when you try 
to use an encoding that has the name of a module in the 
search path.

To demonstrate, save the following line to test.py:

print 'Just testing'.encode('test')

and run it. This results in a CodecRegistryError 
exception: "module "test" (test.pyc) failed to register"

The bug is present in 2.2.1 and in HEAD. In HEAD there 
was actually a bugfix for this but it was incomplete.

Patches for 2.2.1 and HEAD attached.

Greetings,
Geert Jansen

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:33

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; applied as __init__.py 1.9 and 1.6.12.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:39:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:39:39 -0700
Subject: [Patches] [ python-Patches-572796 ] Executable .pyc-files with hashbang
Message-ID: <E17YmOh-00085m-00@usw-sf-web4.sourceforge.net>

Patches item #572796, was opened at 2002-06-23 18:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572796&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Peter Åstrand (astrand)
Assigned to: Nobody/Anonymous (nobody)
Summary: Executable .pyc-files with hashbang

Initial Comment:
As an experiment, I've tested if it was possible to add
hashbang (like #!/usr/bin/python) to compiled
.pyc-files. The attached patched makes this possible. 

This can be useful when distributing applications as
bytecode. Currently, on a UNIX system, it's necessary
to make a wrapper script. 

I haven't considered portability issues with non-UNIX
platforms and things like that. Also, the hash and bang
may collide with the magic number. The patch is just a
proof-of-concept. 

I won't be surprised if you all think that this is a
bad idea, but I thought I should send the patch anyway.
Has this been discussed before?


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:39

Message:
Logged In: YES 
user_id=21627

Since nobody has spoken in favour of this patch, I'm
rejecting it.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-30 18:23

Message:
Logged In: YES 
user_id=21627

On Linux, you can also use

import imp,sys,string
magic = string.join(["\x%.2x" % ord(c) for c in
imp.get_magic()],"")
reg = ':pyc:M::%s::%s:' % (magic, sys.executable)
open("/proc/sys/fs/binfmt_misc/register","wb").write(reg)

to make the system recognize .pyc files (see Misc/NEWS).

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2002-06-23 18:51

Message:
Logged In: NO 

Well, there's this:

http://www.lyra.org/greg/python/#dev

Does that help?

(mwh, having a fight with sf's login system)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=572796&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 12:46:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 04:46:22 -0700
Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS
Message-ID: <E17YmVC-0008Eb-00@usw-sf-web4.sourceforge.net>

Patches item #577031, was opened at 2002-07-03 17:57
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Remove PyArg_Parse() and METH_OLDARGS

Initial Comment:
This patch removes more PyArg_Parse() and METH_OLDARGS
which are deprecated.
I've tested in select and string, but want to make sure
there's nothing else I'm missing.

I also have a huge change to glmodule, but I can't test
that.  The diff is attached.
Let me know if I should check in glmodule or leave it
alone.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:46

Message:
Logged In: YES 
user_id=21627

The other patches look all fine, please apply them. For
fmmodule, I'd recommend to convert those functions to
VARARGS/ParseTuple.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-27 03:04

Message:
Logged In: YES 
user_id=33168

All the "s" / PyString_Check() changes are in fmmodule.  I
suggest to not patch fmmodule now.  Are all the other
changes ok?  Should I bother fixing glmodule at all?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:45

Message:
Logged In: YES 
user_id=21627

The changes look good, except for the ones that change
parsing of "s" to PyString_Check: that means to lose support
for Unicode.

For some of these methods, that may be acceptable, but that
would need documentation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 16:26:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 08:26:46 -0700
Subject: [Patches] [ python-Patches-577031 ] Remove PyArg_Parse() and METH_OLDARGS
Message-ID: <E17YpwU-0003CZ-00@usw-sf-web4.sourceforge.net>

Patches item #577031, was opened at 2002-07-03 11:57
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: Remove PyArg_Parse() and METH_OLDARGS

Initial Comment:
This patch removes more PyArg_Parse() and METH_OLDARGS
which are deprecated.
I've tested in select and string, but want to make sure
there's nothing else I'm missing.

I also have a huge change to glmodule, but I can't test
that.  The diff is attached.
Let me know if I should check in glmodule or leave it
alone.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-28 11:26

Message:
Logged In: YES 
user_id=33168

Closing this patch.  I'll make a new patch for changing
fmmodule as suggested.
Checked in as:  glmodule 2.10, stringobject.c 2.172,
selectmodule.c 2.68.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 07:46

Message:
Logged In: YES 
user_id=21627

The other patches look all fine, please apply them. For
fmmodule, I'd recommend to convert those functions to
VARARGS/ParseTuple.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-26 21:04

Message:
Logged In: YES 
user_id=33168

All the "s" / PyString_Check() changes are in fmmodule.  I
suggest to not patch fmmodule now.  Are all the other
changes ok?  Should I bother fixing glmodule at all?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 01:45

Message:
Logged In: YES 
user_id=21627

The changes look good, except for the ones that change
parsing of "s" to PyString_Check: that means to lose support
for Unicode.

For some of these methods, that may be acceptable, but that
would need documentation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577031&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 16:54:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 08:54:31 -0700
Subject: [Patches] [ python-Patches-574747 ] Make python-mode.el use jython
Message-ID: <E17YqNL-0000oC-00@usw-sf-web3.sourceforge.net>

Patches item #574747, was opened at 2002-06-27 21:37
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574747&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: Kevin J. Butler (kevinbutler)
Assigned to: Nobody/Anonymous (nobody)
Summary: Make python-mode.el use jython

Initial Comment:
I believe it is time to default to using the "jython"
interpreter rather than the "jpython" interpreter.

This patch does this in a minimal way, rather than
changing all 'jpython' references to 'jython', it just
changes the default interpreter command to jython and
notes the two names.

(I still prefer the 'JPython' name...)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 17:54

Message:
Logged In: YES 
user_id=21627

Duplicate of 574750.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574747&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:00:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:00:35 -0700
Subject: [Patches] [ python-Patches-574750 ] Make python-mode.el use "jython" interp
Message-ID: <E17YqTD-0003jY-00@usw-sf-web4.sourceforge.net>

Patches item #574750, was opened at 2002-06-27 21:38
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574750&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Kevin J. Butler (kevinbutler)
Assigned to: Nobody/Anonymous (nobody)
>Summary: Make python-mode.el use "jython" interp

Initial Comment:
I believe it is time to start using the "jython"
interpreter by default, rather than the "jpython"
interpreter.

This patch does it in a minimal way, just changing the
command and acknowledging the two names.  (I still
prefer the JPython name, but...)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:00

Message:
Logged In: YES 
user_id=21627

Please use context or unified diffs for patches; I'm
attaching your change as a diff.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574750&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:25:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:25:11 -0700
Subject: [Patches] [ python-Patches-574707 ] makesockaddr, use addrlen with AF_UNIX
Message-ID: <E17Yqr1-0006iR-00@usw-sf-web1.sourceforge.net>

Patches item #574707, was opened at 2002-06-27 20:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574707&group_id=5470

Category: Modules
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Donn Cave (donnc)
Assigned to: Nobody/Anonymous (nobody)
Summary: makesockaddr, use addrlen with AF_UNIX

Initial Comment:
makesockaddr(), in 2.1 source, expects a NUL terminated string 
in sockaddr_un.sun_path.  That expectation is routinely not 
met on some platforms - NetBSD 1.5.2, AIX 4.3.3, probably 
others.  This patch shows how to use addrlen to determine the 
correct length of the value of sun_path.

Here's the diff (I have no idea what it means to "attach"
a file from my web browser), against 2.1 source.

*** socketmodule.c.dist Sun Apr 15 17:21:33 2001
--- socketmodule.c      Thu Jun 27 11:09:57 2002
***************
*** 597,603 ****
        case AF_UNIX:
        {
                struct sockaddr_un *a = (struct sockaddr_un *) 
addr;
!               return PyString_FromString(a->sun_path);
        }
  #endif

--- 597,605 ----
        case AF_UNIX:
        {
                struct sockaddr_un *a = (struct sockaddr_un *) 
addr;
!               return PyString_FromStringAndSize(a->sun_path,
!                       addrlen -
!                       (sizeof(struct sockaddr_un) - sizeof(a
->sun_path)));
        }
  #endif

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:25

Message:
Logged In: YES 
user_id=21627

This patch does not work. On systems where a NUL is
returned, this NUL is also accounted-for in addrlen, and
hence included in the string.

Would you like to revise your patch to support both cases?
Feel free to use the offsetof macro, btw.

Attaching a file is done by checking "Check to Upload and
Attach a File:" and adding a file name in the field below.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574707&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:34:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:34:19 -0700
Subject: [Patches] [ python-Patches-573770 ] Changing owner of symlinks
Message-ID: <E17Yqzr-0008VZ-00@usw-sf-web2.sourceforge.net>

Patches item #573770, was opened at 2002-06-25 21:15
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=573770&group_id=5470

Category: Modules
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Nobody/Anonymous (nobody)
Summary: Changing owner of symlinks

Initial Comment:
Currently, there's no way to change the owner of a symbolic link, 
since chown() follow them. This patch implements the missing 
lchown() function in posixmodule. 
 

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:34

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; applied as

configure 1.325
configure.in 1.336
pyconfig.h.in 1.46
libos.tex 1.93
NEWS 1.446
posixmodule.c 2.245


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=573770&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:36:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:36:15 -0700
Subject: [Patches] [ python-Patches-574867 ] list.extend docstring fix
Message-ID: <E17Yr1j-0007PZ-00@usw-sf-web5.sourceforge.net>

Patches item #574867, was opened at 2002-06-28 01:32
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574867&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
Summary: list.extend docstring fix

Initial Comment:
The current docstring implies that extend() can only 
accept list arguments.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:36

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Applied as listobject.c 2.128.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=574867&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:42:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:42:09 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17Yr7R-0004JZ-00@usw-sf-web4.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Heller (theller)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:42

Message:
Logged In: YES 
user_id=21627

The patch looks good, please apply it, with the following
changes:
- add \versionadded marks into the documentation;
- add an entry to Misc/NEWS.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-05 08:57

Message:
Logged In: YES 
user_id=11105

Sure. Patch uploaded: docpatch.diff

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:47

Message:
Logged In: YES 
user_id=21627

If this is meant to be used by extension modules, it should
be documented.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-02 18:13

Message:
Logged In: YES 
user_id=11105

Patch for the header file was missing...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:42:40 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:42:40 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17Yr7w-0007Ve-00@usw-sf-web5.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Thomas Heller (theller)
>Assigned to: Thomas Heller (theller)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:42

Message:
Logged In: YES 
user_id=21627

The patch looks good, please apply it, with the following
changes:
- add \versionadded marks into the documentation;
- add an entry to Misc/NEWS.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-05 08:57

Message:
Logged In: YES 
user_id=11105

Sure. Patch uploaded: docpatch.diff

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:47

Message:
Logged In: YES 
user_id=21627

If this is meant to be used by extension modules, it should
be documented.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-02 18:13

Message:
Logged In: YES 
user_id=11105

Patch for the header file was missing...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:47:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:47:36 -0700
Subject: [Patches] [ python-Patches-580670 ] less restrictive HTML comments
Message-ID: <E17YrCi-0004OA-00@usw-sf-web4.sourceforge.net>

Patches item #580670, was opened at 2002-07-12 19:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Bill Bell (wbell539)
Assigned to: Nobody/Anonymous (nobody)
Summary: less restrictive HTML comments

Initial Comment:

Current code enforces requirement that HTML comments open 
with '<!--'. Suggest patch which provides for less restrictive syntax, 
since current syntax requirements rejects significant fraction of 
pages.

Affects sgmllib.py and markupbase.py.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:47

Message:
Logged In: YES 
user_id=21627

Can you give examples of a few pages that contain such
comments? I doubt that this is all that common, and if you
still need to parse them, it might be better to make your
own adjustments to htmllib.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 17:52:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 09:52:32 -0700
Subject: [Patches] [ python-Patches-586999 ] error in example in smtplib.py
Message-ID: <E17YrHU-0000ON-00@usw-sf-web2.sourceforge.net>

Patches item #586999, was opened at 2002-07-26 15:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586999&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Stepan Koltsov (yozh)
Assigned to: Nobody/Anonymous (nobody)
Summary: error in example in smtplib.py

Initial Comment:
I found this while looking for errors that can appear
if PEP 295 will be approved ;-)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:52

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. I assume you meant \ instead of \.
Committed as smtplib.py 1.60.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=586999&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 18:29:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 10:29:14 -0700
Subject: [Patches] [ python-Patches-580670 ] less restrictive HTML comments
Message-ID: <E17Yrr0-00029G-00@usw-sf-web3.sourceforge.net>

Patches item #580670, was opened at 2002-07-12 13:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Bill Bell (wbell539)
Assigned to: Nobody/Anonymous (nobody)
Summary: less restrictive HTML comments

Initial Comment:

Current code enforces requirement that HTML comments open 
with '<!--'. Suggest patch which provides for less restrictive syntax, 
since current syntax requirements rejects significant fraction of 
pages.

Affects sgmllib.py and markupbase.py.

----------------------------------------------------------------------

>Comment By: Bill Bell (wbell539)
Date: 2002-07-28 13:29

Message:
Logged In: YES 
user_id=109396

OK, Martin, I withdraw the suggestion.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:47

Message:
Logged In: YES 
user_id=21627

Can you give examples of a few pages that contain such
comments? I doubt that this is all that common, and if you
still need to parse them, it might be better to make your
own adjustments to htmllib.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 19:08:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 11:08:00 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YsSW-0005gN-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 19:09:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 11:09:07 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YsTb-0005hb-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 19:14:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 11:14:00 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YsYK-0005mH-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 20:00:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 12:00:03 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YtGt-0000eg-00@usw-sf-web1.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Sun Jul 28 20:28:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 12:28:39 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17YtiZ-0006pb-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 04:10:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 28 Jul 2002 20:10:04 -0700
Subject: [Patches] [ python-Patches-587889 ] fix memory leak of tp_doc in typeobject
Message-ID: <E17Z0v6-0000CF-00@usw-sf-web2.sourceforge.net>

Patches item #587889, was opened at 2002-07-28 23:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Guido van Rossum (gvanrossum)
Summary: fix memory leak of tp_doc in typeobject

Initial Comment:
Attached is a patch which fixes a memory leak in
typeobject.c.  I would have checked this in, but there
was a line which concerned me in
Objects/structseq.c::PyStructSequence_InitType():343. 
In this function, it assigns the tp_doc from a
PyStructSequence_Desc* which is passed in.  I'm not
sure where this memory comes from, so I wasn't sure if
the patch would create problems.

The memory leak was found by using valgrind: 
http://developer.kde.org/~sewardj/

Another thing I saw was that it *may* be possible that
the __doc__ is not a string.  But in two places there
were PyString_FromString (1 was the macro).  The only
way I can see a non-string in tp_doc is from the
InitType function in structseq.  I haven't traced it
further, so if structseq can only have a string, there
shouldn't be a problem.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 08:54:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 00:54:02 -0700
Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix)
Message-ID: <E17Z5Lu-0006TO-00@usw-sf-web5.sourceforge.net>

Patches item #584245, was opened at 2002-07-20 18:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: get python to link on OSF1 (Dec Unix)

Initial Comment:
Attached is a patch to fix the linking of python
(makedev not found) on Dec OSF/1 Unix 5.1.  This patch
has also been tested on Linux (RedHat 7.2).

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 09:54

Message:
Logged In: YES 
user_id=21627

Looks good; please apply it.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-27 02:58

Message:
Logged In: YES 
user_id=33168

This patch uses AC_TRY_LINK instead of AC_TRY_RUN.  It tries
makedev according to Martin's suggestion.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-23 23:43

Message:
Logged In: YES 
user_id=21627

That patch doesn't really test whether defining 
OSF_SOURCE helps in getting makedev, does it? In 
particular, if makedev is not available at all, or requires a 
different define, the test will still conclude that 
OSF_SOURCE should be defined, right?

I think the sequence should be:
- is makedev already available?
- if not, is it with OSF_SOURCE defined?
- if not, arrange to exclude makedev from posixmodule.c

Also, is it necessary to run the test program? autoconf is 
always worried that cross-compilation would fail, since you 
cannot run tests (although it is reasonable to link test 
programs in a cross-compilation environment).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 08:57:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 00:57:26 -0700
Subject: [Patches] [ python-Patches-580670 ] less restrictive HTML comments
Message-ID: <E17Z5PC-0004jY-00@usw-sf-web2.sourceforge.net>

Patches item #580670, was opened at 2002-07-12 19:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Bill Bell (wbell539)
Assigned to: Nobody/Anonymous (nobody)
Summary: less restrictive HTML comments

Initial Comment:

Current code enforces requirement that HTML comments open 
with '<!--'. Suggest patch which provides for less restrictive syntax, 
since current syntax requirements rejects significant fraction of 
pages.

Affects sgmllib.py and markupbase.py.

----------------------------------------------------------------------

Comment By: Bill Bell (wbell539)
Date: 2002-07-28 19:29

Message:
Logged In: YES 
user_id=109396

OK, Martin, I withdraw the suggestion.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:47

Message:
Logged In: YES 
user_id=21627

Can you give examples of a few pages that contain such
comments? I doubt that this is all that common, and if you
still need to parse them, it might be better to make your
own adjustments to htmllib.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580670&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 09:23:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 01:23:26 -0700
Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type
Message-ID: <E17Z5oM-0007oC-00@usw-sf-web3.sourceforge.net>

Patches item #554192, was opened at 2002-05-09 19:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: mimetypes: all extensions for a type

Initial Comment:
This patch adds a function guess_all_extensions to 
mimetypes.py. This function returns all known 
extensions for a given type, not just the first one 
found in the types_map dictionary. guess_extension is 
still present and returns the first from the list.

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-29 10:23

Message:
Logged In: YES 
user_id=89016

The patch adds an inverted mapping (i.e. mapping from type
to a list of extensions). add_type simplifies adding a
type<->ext mapping to both dictionaries. If this method
should not be exposed we could make the name private.
(_add_type)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:30

Message:
Logged In: YES 
user_id=21627

What is the role of add_type in this patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 09:44:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 01:44:09 -0700
Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type
Message-ID: <E17Z68P-0005mk-00@usw-sf-web2.sourceforge.net>

Patches item #554192, was opened at 2002-05-09 19:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: mimetypes: all extensions for a type

Initial Comment:
This patch adds a function guess_all_extensions to 
mimetypes.py. This function returns all known 
extensions for a given type, not just the first one 
found in the types_map dictionary. guess_extension is 
still present and returns the first from the list.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 10:44

Message:
Logged In: YES 
user_id=21627

I can't see the point of making it private, since it is not
used inside the module. If you plan to use it, that usage
certainly is outside of the module, so the method would be
public.

If it is public, it needs to be exposed on the module level,
and it needs to be documented.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-29 10:23

Message:
Logged In: YES 
user_id=89016

The patch adds an inverted mapping (i.e. mapping from type
to a list of extensions). add_type simplifies adding a
type<->ext mapping to both dictionaries. If this method
should not be exposed we could make the name private.
(_add_type)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:30

Message:
Logged In: YES 
user_id=21627

What is the role of add_type in this patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 12:27:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 04:27:43 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17Z8gh-0001A2-00@usw-sf-web1.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 11:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Nobody/Anonymous (nobody)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 12:47:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 04:47:50 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17Z90A-0005G9-00@usw-sf-web4.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
>Assigned to: Tim Peters (tim_one)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 21:47

Message:
Logged In: YES 
user_id=250749

Tim,

1.  any objections to the "final" patches?

2.  do you see any reason not to backport your XXXROUNDUP
change - it qualifies as a performance/behaviour bugfix IMO.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-21 23:16

Message:
Logged In: YES 
user_id=250749

Ok, I've prepared patches to convert the following files to
use PyMalloc for memory allocation:
Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff)
Python/compile.c (pymalloc-compile.diff)

I didn't bother with the other files in Parser/ as my malloc
logging shows that they only ever appear to make requests >
256 bytes.

I have attached/will attach a summary from my malloc logging
experiments for information.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 04:14

Message:
Logged In: YES 
user_id=31435

Thanks for the detailed followup, Andrew!  I incorporated 
some of this info into XXXROUNDUP's comments.

Without either patch, the system malloc has to do two 
miserable things:  (1) find bigger and bigger memory areas 
very frequently; and, (2) interleaved with that, allocate 
gazillions of tiny blocks too.  #2 makes it difficult for the 
platform malloc to find free space contiguous to the blocks 
allocated for #1, unless it arranges to move them to "the 
end" of memory, or into their own memory segments.  As a 
result it's likely to do a copy on nearly every large-block 
realloc, and the code used to do a realloc on every 3rd new 
child.

The XXXROUNDUP patch addressed #1 by asking to grow 
blocks much less frequently; PyMalloc addresses #2 by 
getting the tiny blocks out of the platform malloc's hair.  If 
the platform malloc is saved from either one, it's job 
becomes much easier.

It would still be nice to switch the parser to using 
pymalloc.  There are still disasters lurking, because some 
platform malloc packages appear to take quadratic time 
when *free*ing gazillions of tiny blocks (they thrash trying 
to coalesce them into larger contiguous free blocks).  
pymalloc doesn't try to coalesce free blocks, so is reliably 
immune to this disease.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 21:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 20:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 16:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 07:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 16:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 12:53:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 04:53:33 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Z95h-0005MP-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-27 01:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 21:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 05:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 05:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 04:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 04:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 04:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 21:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 18:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 11:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 06:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-27 05:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 04:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 03:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-27 02:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 02:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-27 02:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 14:12:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 06:12:28 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17ZAK4-0007SL-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 15:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-29 13:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 11:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 19:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 19:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 18:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 18:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 18:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 11:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 08:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-27 01:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 20:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 19:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 18:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 17:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 16:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 16:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 14:31:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 06:31:56 -0700
Subject: [Patches] [ python-Patches-571603 ] Fix bug in encodings.search_function
Message-ID: <E17ZAcu-0003f4-00@usw-sf-web2.sourceforge.net>

Patches item #571603, was opened at 2002-06-20 13:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix bug in encodings.search_function

Initial Comment:
Hi,

there seems to be a bug in the default encoding search 
function (search_function in encodings/__init__.py. The 
function tries to load a module with the name of the 
encoding, but it doesn't require that this module is in the 
encodings/ directory. This leads to trouble when you try 
to use an encoding that has the name of a module in the 
search path.

To demonstrate, save the following line to test.py:

print 'Just testing'.encode('test')

and run it. This results in a CodecRegistryError 
exception: "module "test" (test.pyc) failed to register"

The bug is present in 2.2.1 and in HEAD. In HEAD there 
was actually a bugfix for this but it was incomplete.

Patches for 2.2.1 and HEAD attached.

Greetings,
Geert Jansen

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 15:31

Message:
Logged In: YES 
user_id=21627

It's actually not a bug to pass a module outside of
encodings/; the standard search function is supposed to find
other modules as well. So I have to rever thsi change.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:33

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; applied as __init__.py 1.9 and 1.6.12.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 14:54:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 06:54:51 -0700
Subject: [Patches] [ python-Patches-579433 ] Solaris openpty() and forkpty() addition
Message-ID: <E17ZAz5-0008Q7-00@usw-sf-web4.sourceforge.net>

Patches item #579433, was opened at 2002-07-10 05:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579433&group_id=5470

Category: Modules
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Lance Ellinghaus (ellinghaus)
Assigned to: Nobody/Anonymous (nobody)
Summary: Solaris openpty() and forkpty() addition

Initial Comment:
This patch provides a Solaris 2.8 version of openpty() 
and forkpty() since they are not provided for in the 
distribution of Solaris. This has only been tested on 
Solaris 2.8.
This was posed to Python-DEV and I was told to post it 
here, so I am.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 15:54

Message:
Logged In: YES 
user_id=21627

I think this patch should be generalized beyond Solaris to
be acceptable.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=579433&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 15:30:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 07:30:06 -0700
Subject: [Patches] [ python-Patches-576458 ] Extend PyErr_SetFromWindowsErr
Message-ID: <E17ZBXC-00051I-00@usw-sf-web2.sourceforge.net>

Patches item #576458, was opened at 2002-07-02 18:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Thomas Heller (theller)
Assigned to: Thomas Heller (theller)
Summary: Extend PyErr_SetFromWindowsErr

Initial Comment:
PyErr_SetFromWindowsErr and 
PyErr_SetFromWindowsErrWithFilename can only raise 
PyExc_WindowsError. This patch introduces variants of 
these functions taking an additional PyObject* 
parameter, which allows to specify the type of the 
exception to raise.

----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2002-07-29 16:30

Message:
Logged In: YES 
user_id=11105

Thanks. Checked in:

committed   * Up-To-Date  1.8         Doc/api/exceptions.tex
committed   * Up-To-Date  2.55        Include/pyerrors.h
committed   * Up-To-Date  1.447       Misc/NEWS
committed   * Up-To-Date  2.71        Python/errors.c


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 18:42

Message:
Logged In: YES 
user_id=21627

The patch looks good, please apply it, with the following
changes:
- add \versionadded marks into the documentation;
- add an entry to Misc/NEWS.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-05 08:57

Message:
Logged In: YES 
user_id=11105

Sure. Patch uploaded: docpatch.diff

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-05 07:47

Message:
Logged In: YES 
user_id=21627

If this is meant to be used by extension modules, it should
be documented.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-07-02 18:13

Message:
Logged In: YES 
user_id=11105

Patch for the header file was missing...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=576458&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 16:18:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 08:18:39 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17ZCIB-0001yb-00@usw-sf-web4.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 11:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Nobody/Anonymous (nobody)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 15:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 16:34:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 08:34:31 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17ZCXX-0000La-00@usw-sf-web5.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 11:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Nobody/Anonymous (nobody)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-29 15:34

Message:
Logged In: YES 
user_id=6656

Uhh, the instr_[lu]b variables need to keep their values
around the loop; otherwise we might just as well call
PyCode_Addr2Line each time around.

I have another version of the patch that does that, but I
assumed the overhead of doing so was deemed too high, or
someone else would have done this by now.  It's certainly
easier.

Glad to hear it doesn't affect python -O too much.  I was
doing this away from the internet and forgot to keep a clean
copy of the source around for doing comparisons with...

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 15:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 17:20:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 09:20:51 -0700
Subject: [Patches] [ python-Patches-553702 ] Cygwin make install patch
Message-ID: <E17ZDGN-0007Pr-00@usw-sf-web2.sourceforge.net>

Patches item #553702, was opened at 2002-05-08 04:44
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470

Category: Build
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Jason Tishler (jlt63)
Assigned to: Jason Tishler (jlt63)
Summary: Cygwin make install patch

Initial Comment:
This patch fixes make install for Cygwin. Specifically, it reverts
to the previous behavior:

o install libpython$(VERSION)$(SO) in $(BINDIR)
o install $(LDLIBRARY) in $(LIBPL)

It also begins to remove Cygwin's dependency on
$(DLLLIBRARY) which I hope to take advantage of
when I attempt to make Cygwin as similar as possible
to the other Unix platforms (in other patches).

I tested this patch under Red Hat Linux 7.1 without
any ill effects.

BTW, I'm not the happiest using the following
test for Cygwin:

test "$(SO)" = .dll

I'm willing to update the patch to use:

case "$(MACHDEP)" in cygwin*

instead, but IMO that will look uglier.


----------------------------------------------------------------------

>Comment By: Jason Tishler (jlt63)
Date: 2002-07-29 08:20

Message:
Logged In: YES 
user_id=86216

Committed as Makefile.pre.in 1.89.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-07-05 06:36

Message:
Logged In: YES 
user_id=86216

Thanks. I'm on vacation now and will check it in
when I return to work next week.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-04 21:31

Message:
Logged In: YES 
user_id=21627

I think I misinterpreted your patch. It is fine; please
apply it.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-06-27 08:25

Message:
Logged In: YES 
user_id=86216

Sorry for sluggish response time...

Under Cygwin, my patch does the following:

make altbininstall:
/usr/bin/install -c -m 555 libpython2.3.dll /usr/bin

make libainstall:
/usr/bin/install -c -m 644 libpython2.3.dll.a /usr/lib/python2.3/config

So, I am installing the shared library during altbininstall
and the import library during libainstall. Isn't this what
you were asking for in your previous message? Or, do
you want me to install both files during altbininstall?

I'm confused.  Please clarify.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-06-06 01:49

Message:
Logged In: YES 
user_id=21627

On Unix, if a shared libpython is created, it is installed
as part of altbininstall, not as part of libainstall. I feel
that pythonxy.dll is not really a library, but a binary -
quite unlike libpythonxy.a (which is more close to the
import library). So I feel that this patch would better be
incorporated into altbininstall.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-06-04 07:17

Message:
Logged In: YES 
user_id=86216

Please review when you get a chance, thanks.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-05-22 08:30

Message:
Logged In: YES 
user_id=86216

Can I commit this one? Note that make install is
busted under Cygwin without this patch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553702&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 17:57:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 09:57:51 -0700
Subject: [Patches] [ python-Patches-587889 ] fix memory leak of tp_doc in typeobject
Message-ID: <E17ZDqB-0002Ji-00@usw-sf-web5.sourceforge.net>

Patches item #587889, was opened at 2002-07-28 23:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Guido van Rossum (gvanrossum)
Summary: fix memory leak of tp_doc in typeobject

Initial Comment:
Attached is a patch which fixes a memory leak in
typeobject.c.  I would have checked this in, but there
was a line which concerned me in
Objects/structseq.c::PyStructSequence_InitType():343. 
In this function, it assigns the tp_doc from a
PyStructSequence_Desc* which is passed in.  I'm not
sure where this memory comes from, so I wasn't sure if
the patch would create problems.

The memory leak was found by using valgrind: 
http://developer.kde.org/~sewardj/

Another thing I saw was that it *may* be possible that
the __doc__ is not a string.  But in two places there
were PyString_FromString (1 was the macro).  The only
way I can see a non-string in tp_doc is from the
InitType function in structseq.  I haven't traced it
further, so if structseq can only have a string, there
shouldn't be a problem.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-29 12:57

Message:
Logged In: YES 
user_id=6380

PyStructSequence_InitType() initializes a *static* type,
which will never be passed to type_dealloc(). In particular,
PyStructSequence_InitType() copies a template over the type
object which has the default flags in tp_flags, and the
default flags don't include the HEAPTYPE flag that
type_dealloc asserts.

IOW don't worry about that.

While __doc__ may not always be a string, tp_doc is always a
char *.

Instead of PyObject_DEL(), why not use PyObject_Free()? The
macro expands to a call to the function anyway.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 17:58:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 09:58:09 -0700
Subject: [Patches] [ python-Patches-587889 ] fix memory leak of tp_doc in typeobject
Message-ID: <E17ZDqT-0002Kj-00@usw-sf-web5.sourceforge.net>

Patches item #587889, was opened at 2002-07-28 23:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: fix memory leak of tp_doc in typeobject

Initial Comment:
Attached is a patch which fixes a memory leak in
typeobject.c.  I would have checked this in, but there
was a line which concerned me in
Objects/structseq.c::PyStructSequence_InitType():343. 
In this function, it assigns the tp_doc from a
PyStructSequence_Desc* which is passed in.  I'm not
sure where this memory comes from, so I wasn't sure if
the patch would create problems.

The memory leak was found by using valgrind: 
http://developer.kde.org/~sewardj/

Another thing I saw was that it *may* be possible that
the __doc__ is not a string.  But in two places there
were PyString_FromString (1 was the macro).  The only
way I can see a non-string in tp_doc is from the
InitType function in structseq.  I haven't traced it
further, so if structseq can only have a string, there
shouldn't be a problem.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-29 12:57

Message:
Logged In: YES 
user_id=6380

PyStructSequence_InitType() initializes a *static* type,
which will never be passed to type_dealloc(). In particular,
PyStructSequence_InitType() copies a template over the type
object which has the default flags in tp_flags, and the
default flags don't include the HEAPTYPE flag that
type_dealloc asserts.

IOW don't worry about that.

While __doc__ may not always be a string, tp_doc is always a
char *.

Instead of PyObject_DEL(), why not use PyObject_Free()? The
macro expands to a call to the function anyway.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 18:45:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 10:45:56 -0700
Subject: [Patches] [ python-Patches-571603 ] Fix bug in encodings.search_function
Message-ID: <E17ZEai-0002OX-00@usw-sf-web1.sourceforge.net>

Patches item #571603, was opened at 2002-06-20 13:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix bug in encodings.search_function

Initial Comment:
Hi,

there seems to be a bug in the default encoding search 
function (search_function in encodings/__init__.py. The 
function tries to load a module with the name of the 
encoding, but it doesn't require that this module is in the 
encodings/ directory. This leads to trouble when you try 
to use an encoding that has the name of a module in the 
search path.

To demonstrate, save the following line to test.py:

print 'Just testing'.encode('test')

and run it. This results in a CodecRegistryError 
exception: "module "test" (test.pyc) failed to register"

The bug is present in 2.2.1 and in HEAD. In HEAD there 
was actually a bugfix for this but it was incomplete.

Patches for 2.2.1 and HEAD attached.

Greetings,
Geert Jansen

----------------------------------------------------------------------

>Comment By: Geert Jansen (geertj)
Date: 2002-07-29 19:45

Message:
Logged In: YES 
user_id=537938

Hi Martin,

Isn't it wrong to let the module namespace "leak" into the 
encodings namespace? This leads to very unexpected 
behaviour. Why should it be forbidden to have a module with 
the same name as an encoding? This seems rather arbitrary 
and solely an implementation detail.

It is still very easy to add an encoding outside the encodings/ 
directory using the codecs.register() function. Or maybe there 
is another solution?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 15:31

Message:
Logged In: YES 
user_id=21627

It's actually not a bug to pass a module outside of
encodings/; the standard search function is supposed to find
other modules as well. So I have to rever thsi change.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:33

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; applied as __init__.py 1.9 and 1.6.12.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 20:52:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 12:52:32 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17ZGZE-0004Ts-00@usw-sf-web3.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 02:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
>Assigned to: Jeremy Hylton (jhylton)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-29 15:52

Message:
Logged In: YES 
user_id=31435

Reassigned to Jeremy because I'm "on vacation" this week, 
and Jeremy is most familiar w/ the parser code.  Offhand 
the patches looked fine to me, provided that you no longer 
consider test_longexp_fix.diff to be part of the patch set.

I backported the XXXROUNDUP changes to the 2.2 
maintenance branch at the sane time I changed it in the 
HEAD, so nothing left to do there on that count.

Thanks for the great work!

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:47

Message:
Logged In: YES 
user_id=250749

Tim,

1.  any objections to the "final" patches?

2.  do you see any reason not to backport your XXXROUNDUP
change - it qualifies as a performance/behaviour bugfix IMO.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-21 09:16

Message:
Logged In: YES 
user_id=250749

Ok, I've prepared patches to convert the following files to
use PyMalloc for memory allocation:
Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff)
Python/compile.c (pymalloc-compile.diff)

I didn't bother with the other files in Parser/ as my malloc
logging shows that they only ever appear to make requests >
256 bytes.

I have attached/will attach a summary from my malloc logging
experiments for information.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-15 14:14

Message:
Logged In: YES 
user_id=31435

Thanks for the detailed followup, Andrew!  I incorporated 
some of this info into XXXROUNDUP's comments.

Without either patch, the system malloc has to do two 
miserable things:  (1) find bigger and bigger memory areas 
very frequently; and, (2) interleaved with that, allocate 
gazillions of tiny blocks too.  #2 makes it difficult for the 
platform malloc to find free space contiguous to the blocks 
allocated for #1, unless it arranges to move them to "the 
end" of memory, or into their own memory segments.  As a 
result it's likely to do a copy on nearly every large-block 
realloc, and the code used to do a realloc on every 3rd new 
child.

The XXXROUNDUP patch addressed #1 by asking to grow 
blocks much less frequently; PyMalloc addresses #2 by 
getting the tiny blocks out of the platform malloc's hair.  If 
the platform malloc is saved from either one, it's job 
becomes much easier.

It would still be nice to switch the parser to using 
pymalloc.  There are still disasters lurking, because some 
platform malloc packages appear to take quadratic time 
when *free*ing gazillions of tiny blocks (they thrash trying 
to coalesce them into larger contiguous free blocks).  
pymalloc doesn't try to coalesce free blocks, so is reliably 
immune to this disease.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 07:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 06:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 02:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-07 17:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 02:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 21:18:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 13:18:33 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17ZGyP-0007LJ-00@usw-sf-web5.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 07:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Nobody/Anonymous (nobody)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-29 16:18

Message:
Logged In: YES 
user_id=31435

Dropping out of "vacation mode" long enough to say "mondo 
cool!" and encourage this.  Guido may not agree, but I also 
encourage you to redefine c_lnotab if it can make life easier 
and quicker here.  That subtle compression scheme has 
been the source of several nasty bugs, both in the core C 
code and in Jeremy's compiler pkg (cut 'n paste bugs 
abound here, because few people understand what's really 
needed, so flawed code gets copied with little thought).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 11:34

Message:
Logged In: YES 
user_id=6656

Uhh, the instr_[lu]b variables need to keep their values
around the loop; otherwise we might just as well call
PyCode_Addr2Line each time around.

I have another version of the patch that does that, but I
assumed the overhead of doing so was deemed too high, or
someone else would have done this by now.  It's certainly
easier.

Glad to hear it doesn't affect python -O too much.  I was
doing this away from the internet and forgot to keep a clean
copy of the source around for doing comparisons with...

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 11:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Mon Jul 29 22:23:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 14:23:39 -0700
Subject: [Patches] [ python-Patches-584626 ] yield allowed in try/finally
Message-ID: <E17ZHzP-0004ej-00@usw-sf-web2.sourceforge.net>

Patches item #584626, was opened at 2002-07-21 20:29
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584626&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: yield allowed in try/finally

Initial Comment:
A generator's dealloc function now resumes a generator
one last time by jumping directly to the return statement at 
the end of the code.  As a result, the finally section of any 
try/finally blocks is executed.  Any exceptions raised are 
treated just like exceptions in a __del__ finalizer.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 21:23

Message:
Logged In: YES 
user_id=35752

The GC will need to be taught about these finalizers.  Look
for the
method 'has_finalizer' in gcmodule.c.  I don't think we want
that
method to return true for all generator objects since that would
cause any reference cycle containing a generator to become
uncollectable.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584626&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 01:58:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 17:58:06 -0700
Subject: [Patches] [ python-Patches-587889 ] fix memory leak of tp_doc in typeobject
Message-ID: <E17ZLKw-0000lY-00@usw-sf-web3.sourceforge.net>

Patches item #587889, was opened at 2002-07-28 23:10
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neal Norwitz (nnorwitz)
Summary: fix memory leak of tp_doc in typeobject

Initial Comment:
Attached is a patch which fixes a memory leak in
typeobject.c.  I would have checked this in, but there
was a line which concerned me in
Objects/structseq.c::PyStructSequence_InitType():343. 
In this function, it assigns the tp_doc from a
PyStructSequence_Desc* which is passed in.  I'm not
sure where this memory comes from, so I wasn't sure if
the patch would create problems.

The memory leak was found by using valgrind: 
http://developer.kde.org/~sewardj/

Another thing I saw was that it *may* be possible that
the __doc__ is not a string.  But in two places there
were PyString_FromString (1 was the macro).  The only
way I can see a non-string in tp_doc is from the
InitType function in structseq.  I haven't traced it
further, so if structseq can only have a string, there
shouldn't be a problem.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-29 20:58

Message:
Logged In: YES 
user_id=33168

I made a mistake with PyObject_DEL().  I thought tp_doc was
allocated with PyObject_NEW(), but tp_doc is allocated with
PyObject_MALLOC(). I'll use PyObject_Free() to dealloc.

Checked in as typeobject.c 2.164 and 2.126.4.21

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-29 12:57

Message:
Logged In: YES 
user_id=6380

PyStructSequence_InitType() initializes a *static* type,
which will never be passed to type_dealloc(). In particular,
PyStructSequence_InitType() copies a template over the type
object which has the default flags in tp_flags, and the
default flags don't include the HEAPTYPE flag that
type_dealloc asserts.

IOW don't worry about that.

While __doc__ may not always be a string, tp_doc is always a
char *.

Instead of PyObject_DEL(), why not use PyObject_Free()? The
macro expands to a call to the function anyway.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587889&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 02:10:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 18:10:43 -0700
Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix)
Message-ID: <E17ZLX9-0000vO-00@usw-sf-web3.sourceforge.net>

Patches item #584245, was opened at 2002-07-20 12:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470

Category: Build
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neal Norwitz (nnorwitz)
Summary: get python to link on OSF1 (Dec Unix)

Initial Comment:
Attached is a patch to fix the linking of python
(makedev not found) on Dec OSF/1 Unix 5.1.  This patch
has also been tested on Linux (RedHat 7.2).

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-29 21:10

Message:
Logged In: YES 
user_id=33168

Not sure if this should be backported or not.

Checked in as:
configure.in: 1.337
configure: 1.326
pyconfig.h.in: 1.47
posixmodule.c: 2.246


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 03:54

Message:
Logged In: YES 
user_id=21627

Looks good; please apply it.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-26 20:58

Message:
Logged In: YES 
user_id=33168

This patch uses AC_TRY_LINK instead of AC_TRY_RUN.  It tries
makedev according to Martin's suggestion.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-23 17:43

Message:
Logged In: YES 
user_id=21627

That patch doesn't really test whether defining 
OSF_SOURCE helps in getting makedev, does it? In 
particular, if makedev is not available at all, or requires a 
different define, the test will still conclude that 
OSF_SOURCE should be defined, right?

I think the sequence should be:
- is makedev already available?
- if not, is it with OSF_SOURCE defined?
- if not, arrange to exclude makedev from posixmodule.c

Also, is it necessary to run the test program? autoconf is 
always worried that cross-compilation would fail, since you 
cannot run tests (although it is reasonable to link test 
programs in a cross-compilation environment).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 03:21:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 19:21:19 -0700
Subject: [Patches] [ python-Patches-578297 ] fix for problems with test_longexp
Message-ID: <E17ZMdT-00062d-00@usw-sf-web5.sourceforge.net>

Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Jeremy Hylton (jhylton)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-30 12:21

Message:
Logged In: YES 
user_id=250749

Yes, test_longexp_fix.diff is no longer part of the patch set.
Should I delete it?

I must have missed your 2.2 backport commit message.  I 
might also look at whether it can be backported to 2.1 without 
significant side effects.

Thanks for your feedback too.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 05:52

Message:
Logged In: YES 
user_id=31435

Reassigned to Jeremy because I'm "on vacation" this week, 
and Jeremy is most familiar w/ the parser code.  Offhand 
the patches looked fine to me, provided that you no longer 
consider test_longexp_fix.diff to be part of the patch set.

I backported the XXXROUNDUP changes to the 2.2 
maintenance branch at the sane time I changed it in the 
HEAD, so nothing left to do there on that count.

Thanks for the great work!

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 21:47

Message:
Logged In: YES 
user_id=250749

Tim,

1.  any objections to the "final" patches?

2.  do you see any reason not to backport your XXXROUNDUP
change - it qualifies as a performance/behaviour bugfix IMO.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-21 23:16

Message:
Logged In: YES 
user_id=250749

Ok, I've prepared patches to convert the following files to
use PyMalloc for memory allocation:
Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff)
Python/compile.c (pymalloc-compile.diff)

I didn't bother with the other files in Parser/ as my malloc
logging shows that they only ever appear to make requests >
256 bytes.

I have attached/will attach a summary from my malloc logging
experiments for information.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 04:14

Message:
Logged In: YES 
user_id=31435

Thanks for the detailed followup, Andrew!  I incorporated 
some of this info into XXXROUNDUP's comments.

Without either patch, the system malloc has to do two 
miserable things:  (1) find bigger and bigger memory areas 
very frequently; and, (2) interleaved with that, allocate 
gazillions of tiny blocks too.  #2 makes it difficult for the 
platform malloc to find free space contiguous to the blocks 
allocated for #1, unless it arranges to move them to "the 
end" of memory, or into their own memory segments.  As a 
result it's likely to do a copy on nearly every large-block 
realloc, and the code used to do a realloc on every 3rd new 
child.

The XXXROUNDUP patch addressed #1 by asking to grow 
blocks much less frequently; PyMalloc addresses #2 by 
getting the tiny blocks out of the platform malloc's hair.  If 
the platform malloc is saved from either one, it's job 
becomes much easier.

It would still be nice to switch the parser to using 
pymalloc.  There are still disasters lurking, because some 
platform malloc packages appear to take quadratic time 
when *free*ing gazillions of tiny blocks (they thrash trying 
to coalesce them into larger contiguous free blocks).  
pymalloc doesn't try to coalesce free blocks, so is reliably 
immune to this disease.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 21:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 20:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 16:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 07:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 16:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 03:28:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 29 Jul 2002 19:28:48 -0700
Subject: [Patches] [ python-Patches-555085 ] timeout socket implementation
Message-ID: <E17ZMki-000267-00@usw-sf-web3.sourceforge.net>

Patches item #555085, was opened at 2002-05-12 22:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 4
Submitted By: Michael Gilfix (mgilfix)
Assigned to: Guido van Rossum (gvanrossum)
Summary: timeout socket implementation

Initial Comment:
This implements bug #457114 and implements timed socket
operations. If a timeout is set and the timeout period
elaspes before the socket operation has finished, a
socket.error exception is thrown.

This patch integrates the functionality at two levels:
the timeout capability is integrated at the C level in
socketmodule.c. Socket.py was also modified to update 
fileobject creation on a win platform to handle the
case of the underlying socket throwing an exception.
The tex documentation was also updated and a new
regression unit was provided as test_timeout.py.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-30 12:28

Message:
Logged In: YES 
user_id=250749

In private mail to/from Guido, it appears that the FreeBSD 
issues were in test_socket.py, and have been addressed.

I still have outstanding issues on OS/2 EMX, which I sent to 
Guido privately but will add here as soon as I can.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-07-24 06:43

Message:
Logged In: YES 
user_id=116038

Now that I'm back :)

I checked the archive and this seems to have been handled by
you. Please let me know if it isn't resolved and I can give
it a closer look.

Also, perhaps I should contact Bernie and ask him if there's
anything he hasn't gotten around to in the test_timeout that
I can off-load from him.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-19 03:11

Message:
Logged In: YES 
user_id=6380

The default timeout is now implemented in CVS.

There's a bug report from Andrew Macintyre (unfortunately on
python-dev) about test_socket.py failures on FreeBSD. I'll
try to keep an eye on that, so this patch *still* stays
open. Also, Bernie has promised some changes that I haven't
received yet and the details of which I don't recall (sorry
:-( ).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-08 11:47

Message:
Logged In: YES 
user_id=6380

Keeping this open as a reminder of things still to finish.

Most is in the python-dev discussion; Michael Gilfix and
Bernard Yue have offered to produce more patches.

One feature we definitely want is a way to specify a timeout
to be applied to all new sockets.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-07 07:11

Message:
Logged In: YES 
user_id=6380

Thanks for the new version! I've checked this in.  I made
considerable changes; the following is feedback but you
don't need to respond because I've addressed all these in
the checked-in code!

- Thanks for the cleanup of some non-standard formatting.
However, it's better not to do this so the diffs don't show
changes that are unrelated to the timeout patch.

- You are still importing the select module instead of
calling select() directly. I really think you should do the
latter -- the select module has an enormous overhead (it
allocates several large lists on the heap).

- Instead of explicitly testing the argument to settimeout
for being a float, int or long, you should simply call
PyFloat_AsDouble and handle the error; if someone passes
another object that implements __float__ that should be
acceptable.

- gettimeout() returns sock_timeout without checking if it
is NULL. It can be NULL when a socket object is never
initialized. E.g. I can do this:

>>> from socket import *
>>> s = socket.__new__(socket)
>>> s.gettimeout()

which gives me a segfault. There are probably other places
where this is assumed.

- I addressed the latter two issues by making sock_timeout a
double, whose value is < 0.0 when no timeout is set.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-06-06 08:23

Message:
Logged In: YES 
user_id=116038

I've addressed all the issues brought up by Guido. The 2nd
version of the patch is attached here. In this version, I've
modified test_socket.py to include tests for the _fileobject
class in socket.py that was modified by this patch.
_fileobject needed to be modified so that data would not be
lost when the underlying socket threw an expection (data was
no longer accumulated in local variables). The tests for the
_fileobject class succeed on older versions of python
(tested 2.1.3) and pass on the newer version of python.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-05-24 06:18

Message:
Logged In: YES 
user_id=6380

For a detailed review, see

http://mail.python.org/pipermail/python-dev/2002-May/024340.html

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 08:47:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 00:47:33 -0700
Subject: [Patches] [ python-Patches-571603 ] Fix bug in encodings.search_function
Message-ID: <E17ZRjB-0003RT-00@usw-sf-web5.sourceforge.net>

Patches item #571603, was opened at 2002-06-20 13:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix bug in encodings.search_function

Initial Comment:
Hi,

there seems to be a bug in the default encoding search 
function (search_function in encodings/__init__.py. The 
function tries to load a module with the name of the 
encoding, but it doesn't require that this module is in the 
encodings/ directory. This leads to trouble when you try 
to use an encoding that has the name of a module in the 
search path.

To demonstrate, save the following line to test.py:

print 'Just testing'.encode('test')

and run it. This results in a CodecRegistryError 
exception: "module "test" (test.pyc) failed to register"

The bug is present in 2.2.1 and in HEAD. In HEAD there 
was actually a bugfix for this but it was incomplete.

Patches for 2.2.1 and HEAD attached.

Greetings,
Geert Jansen

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-30 09:47

Message:
Logged In: YES 
user_id=21627

Not sure what you mean by "leak". It is certainly desirable
that modules carry the same name as encodings; in fact,
*every* encoding implemented so far has a module with the
same name.

People have been using u"text".encode("japanese.sjis"),
given that the JapaneseCodecs package installs itself into a
Python package "japanese". That must continue to work. In
particular, you patch broke test.test_charmapcodec; make
sure you test your patches before submitting them.

To solve the problem of .encode("test") giving a registry
error, I have now changed the search_function to ignore
modules that don't have a getregentry function.


----------------------------------------------------------------------

Comment By: Geert Jansen (geertj)
Date: 2002-07-29 19:45

Message:
Logged In: YES 
user_id=537938

Hi Martin,

Isn't it wrong to let the module namespace "leak" into the 
encodings namespace? This leads to very unexpected 
behaviour. Why should it be forbidden to have a module with 
the same name as an encoding? This seems rather arbitrary 
and solely an implementation detail.

It is still very easy to add an encoding outside the encodings/ 
directory using the codecs.register() function. Or maybe there 
is another solution?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 15:31

Message:
Logged In: YES 
user_id=21627

It's actually not a bug to pass a module outside of
encodings/; the standard search function is supposed to find
other modules as well. So I have to rever thsi change.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:33

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; applied as __init__.py 1.9 and 1.6.12.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 09:16:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 01:16:55 -0700
Subject: [Patches] [ python-Patches-571603 ] Fix bug in encodings.search_function
Message-ID: <E17ZSBb-0007f1-00@usw-sf-web3.sourceforge.net>

Patches item #571603, was opened at 2002-06-20 13:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix bug in encodings.search_function

Initial Comment:
Hi,

there seems to be a bug in the default encoding search 
function (search_function in encodings/__init__.py. The 
function tries to load a module with the name of the 
encoding, but it doesn't require that this module is in the 
encodings/ directory. This leads to trouble when you try 
to use an encoding that has the name of a module in the 
search path.

To demonstrate, save the following line to test.py:

print 'Just testing'.encode('test')

and run it. This results in a CodecRegistryError 
exception: "module "test" (test.pyc) failed to register"

The bug is present in 2.2.1 and in HEAD. In HEAD there 
was actually a bugfix for this but it was incomplete.

Patches for 2.2.1 and HEAD attached.

Greetings,
Geert Jansen

----------------------------------------------------------------------

>Comment By: Geert Jansen (geertj)
Date: 2002-07-30 10:16

Message:
Logged In: YES 
user_id=537938

I meant by "leak" that the module namespace and the 
encoding namespace are different namespaces and should 
therefore be insolated from each other. Symbols from one 
namespace should not turn up in the other. This is all IMHO 
of course.

But thanks for fixing this problem. Next time I send in a patch 
I'll make sure I run the test suite too... Sorry for that.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-30 09:47

Message:
Logged In: YES 
user_id=21627

Not sure what you mean by "leak". It is certainly desirable
that modules carry the same name as encodings; in fact,
*every* encoding implemented so far has a module with the
same name.

People have been using u"text".encode("japanese.sjis"),
given that the JapaneseCodecs package installs itself into a
Python package "japanese". That must continue to work. In
particular, you patch broke test.test_charmapcodec; make
sure you test your patches before submitting them.

To solve the problem of .encode("test") giving a registry
error, I have now changed the search_function to ignore
modules that don't have a getregentry function.


----------------------------------------------------------------------

Comment By: Geert Jansen (geertj)
Date: 2002-07-29 19:45

Message:
Logged In: YES 
user_id=537938

Hi Martin,

Isn't it wrong to let the module namespace "leak" into the 
encodings namespace? This leads to very unexpected 
behaviour. Why should it be forbidden to have a module with 
the same name as an encoding? This seems rather arbitrary 
and solely an implementation detail.

It is still very easy to add an encoding outside the encodings/ 
directory using the codecs.register() function. Or maybe there 
is another solution?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 15:31

Message:
Logged In: YES 
user_id=21627

It's actually not a bug to pass a module outside of
encodings/; the standard search function is supposed to find
other modules as well. So I have to rever thsi change.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 13:33

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; applied as __init__.py 1.9 and 1.6.12.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=571603&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 09:45:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 01:45:37 -0700
Subject: [Patches] [ python-Patches-584245 ] get python to link on OSF1 (Dec Unix)
Message-ID: <E17ZSdN-0003gk-00@usw-sf-web4.sourceforge.net>

Patches item #584245, was opened at 2002-07-20 18:49
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470

Category: Build
Group: Python 2.3
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neal Norwitz (nnorwitz)
Summary: get python to link on OSF1 (Dec Unix)

Initial Comment:
Attached is a patch to fix the linking of python
(makedev not found) on Dec OSF/1 Unix 5.1.  This patch
has also been tested on Linux (RedHat 7.2).

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-30 10:45

Message:
Logged In: YES 
user_id=21627

Backporting it should not be necessary since mknod is not in
Python 2.2.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-30 03:10

Message:
Logged In: YES 
user_id=33168

Not sure if this should be backported or not.

Checked in as:
configure.in: 1.337
configure: 1.326
pyconfig.h.in: 1.47
posixmodule.c: 2.246


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 09:54

Message:
Logged In: YES 
user_id=21627

Looks good; please apply it.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-27 02:58

Message:
Logged In: YES 
user_id=33168

This patch uses AC_TRY_LINK instead of AC_TRY_RUN.  It tries
makedev according to Martin's suggestion.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-23 23:43

Message:
Logged In: YES 
user_id=21627

That patch doesn't really test whether defining 
OSF_SOURCE helps in getting makedev, does it? In 
particular, if makedev is not available at all, or requires a 
different define, the test will still conclude that 
OSF_SOURCE should be defined, right?

I think the sequence should be:
- is makedev already available?
- if not, is it with OSF_SOURCE defined?
- if not, arrange to exclude makedev from posixmodule.c

Also, is it necessary to run the test program? autoconf is 
always worried that cross-compilation would fail, since you 
cannot run tests (although it is reasonable to link test 
programs in a cross-compilation environment).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=584245&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 10:58:53 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 02:58:53 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17ZTmH-0008KI-00@usw-sf-web2.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 11:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Nobody/Anonymous (nobody)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-30 09:58

Message:
Logged In: YES 
user_id=6656

I worked out why some of the code in ceval.c wasn't making
sense to me -- it didn't make sense, period.

I've also fixed a number of silly and not so silly bugs in
my patch.  I'm now 99% certain this idea can fly.  The patch
isn't *finished* but the hard bit is done, IMHO.

There are some other points to make, but I think I'll raise
them on python-dev.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 20:18

Message:
Logged In: YES 
user_id=31435

Dropping out of "vacation mode" long enough to say "mondo 
cool!" and encourage this.  Guido may not agree, but I also 
encourage you to redefine c_lnotab if it can make life easier 
and quicker here.  That subtle compression scheme has 
been the source of several nasty bugs, both in the core C 
code and in Jeremy's compiler pkg (cut 'n paste bugs 
abound here, because few people understand what's really 
needed, so flawed code gets copied with little thought).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 15:34

Message:
Logged In: YES 
user_id=6656

Uhh, the instr_[lu]b variables need to keep their values
around the loop; otherwise we might just as well call
PyCode_Addr2Line each time around.

I have another version of the patch that does that, but I
assumed the overhead of doing so was deemed too high, or
someone else would have done this by now.  It's certainly
easier.

Glad to hear it doesn't affect python -O too much.  I was
doing this away from the internet and forgot to keep a clean
copy of the source around for doing comparisons with...

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 15:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 10:59:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 02:59:46 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17ZTn8-0001aR-00@usw-sf-web3.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 11:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-30 09:59

Message:
Logged In: YES 
user_id=6656

Guido should see this, assuming he still isn't subscribed to
patches@python.org.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-30 09:58

Message:
Logged In: YES 
user_id=6656

I worked out why some of the code in ceval.c wasn't making
sense to me -- it didn't make sense, period.

I've also fixed a number of silly and not so silly bugs in
my patch.  I'm now 99% certain this idea can fly.  The patch
isn't *finished* but the hard bit is done, IMHO.

There are some other points to make, but I think I'll raise
them on python-dev.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 20:18

Message:
Logged In: YES 
user_id=31435

Dropping out of "vacation mode" long enough to say "mondo 
cool!" and encourage this.  Guido may not agree, but I also 
encourage you to redefine c_lnotab if it can make life easier 
and quicker here.  That subtle compression scheme has 
been the source of several nasty bugs, both in the core C 
code and in Jeremy's compiler pkg (cut 'n paste bugs 
abound here, because few people understand what's really 
needed, so flawed code gets copied with little thought).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 15:34

Message:
Logged In: YES 
user_id=6656

Uhh, the instr_[lu]b variables need to keep their values
around the loop; otherwise we might just as well call
PyCode_Addr2Line each time around.

I have another version of the patch that does that, but I
assumed the overhead of doing so was deemed too high, or
someone else would have done this by now.  It's certainly
easier.

Glad to hear it doesn't affect python -O too much.  I was
doing this away from the internet and forgot to keep a clean
copy of the source around for doing comparisons with...

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 15:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 12:00:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 04:00:23 -0700
Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type
Message-ID: <E17ZUjn-0006YI-00@usw-sf-web4.sourceforge.net>

Patches item #554192, was opened at 2002-05-09 19:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: mimetypes: all extensions for a type

Initial Comment:
This patch adds a function guess_all_extensions to 
mimetypes.py. This function returns all known 
extensions for a given type, not just the first one 
found in the types_map dictionary. guess_extension is 
still present and returns the first from the list.

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-30 13:00

Message:
Logged In: YES 
user_id=89016

It *is* used in two spots: The constructor and the readfp 
method. But exposing it at the module level could make 
sense, because it is the atomic method of adding mime type 
information. So should it change the patch to expose it at the 
module level and change the LaTeX documentation 
accordingly?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 10:44

Message:
Logged In: YES 
user_id=21627

I can't see the point of making it private, since it is not
used inside the module. If you plan to use it, that usage
certainly is outside of the module, so the method would be
public.

If it is public, it needs to be exposed on the module level,
and it needs to be documented.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-29 10:23

Message:
Logged In: YES 
user_id=89016

The patch adds an inverted mapping (i.e. mapping from type
to a list of extensions). add_type simplifies adding a
type<->ext mapping to both dictionaries. If this method
should not be exposed we could make the name private.
(_add_type)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:30

Message:
Logged In: YES 
user_id=21627

What is the role of add_type in this patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 13:31:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 05:31:10 -0700
Subject: [Patches] [ python-Patches-544113 ] merging sorted sequences
Message-ID: <E17ZW9e-0007aV-00@usw-sf-web1.sourceforge.net>

Patches item #544113, was opened at 2002-04-15 13:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=544113&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: merging sorted sequences

Initial Comment:
This patch is intended to add to the bisect module a function witch permit to merge several sorted sequences into an ordered list.

----------------------------------------------------------------------

>Comment By: Sebastien Keim (s_keim)
Date: 2002-07-30 14:31

Message:
Logged In: YES 
user_id=498191

I must agree that it's not something that is needed everyday, I will put it in the Cookbook.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-28 00:23

Message:
Logged In: YES 
user_id=6380

Thanks.

This doesn't strike me as a "fundamental" algorithm like
bisection or heap sort. I don't think I've ever needed this,
except perhaps in situations where the amount of data was
small enough that simply concatenating the lists and sorting
them was an acceptable 3-line solution.

Therefore I'm rejecting this unless you get someone of
importance to plead for it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=544113&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 13:34:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 05:34:05 -0700
Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type
Message-ID: <E17ZWCT-0008MT-00@usw-sf-web4.sourceforge.net>

Patches item #554192, was opened at 2002-05-09 19:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: mimetypes: all extensions for a type

Initial Comment:
This patch adds a function guess_all_extensions to 
mimetypes.py. This function returns all known 
extensions for a given type, not just the first one 
found in the types_map dictionary. guess_extension is 
still present and returns the first from the list.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-07-30 14:34

Message:
Logged In: YES 
user_id=21627

I'm in favour of exposing it on the module level. If you are
uncertain, you might want to ask on python-dev.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-30 13:00

Message:
Logged In: YES 
user_id=89016

It *is* used in two spots: The constructor and the readfp 
method. But exposing it at the module level could make 
sense, because it is the atomic method of adding mime type 
information. So should it change the patch to expose it at the 
module level and change the LaTeX documentation 
accordingly?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 10:44

Message:
Logged In: YES 
user_id=21627

I can't see the point of making it private, since it is not
used inside the module. If you plan to use it, that usage
certainly is outside of the module, so the method would be
public.

If it is public, it needs to be exposed on the module level,
and it needs to be documented.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-29 10:23

Message:
Logged In: YES 
user_id=89016

The patch adds an inverted mapping (i.e. mapping from type
to a list of extensions). add_type simplifies adding a
type<->ext mapping to both dictionaries. If this method
should not be exposed we could make the name private.
(_add_type)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:30

Message:
Logged In: YES 
user_id=21627

What is the role of add_type in this patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 14:52:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 06:52:32 -0700
Subject: [Patches] [ python-Patches-588561 ] Cygwin _hotshot patch
Message-ID: <E17ZXQO-0001pj-00@usw-sf-web4.sourceforge.net>

Patches item #588561, was opened at 2002-07-30 05:52
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588561&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Tishler (jlt63)
Assigned to: Martin v. Löwis (loewis)
Summary: Cygwin _hotshot patch

Initial Comment:
YA Cygwin module patch very similar to other patches
that I have submitted.  I tested under Cygwin and Red
Hat Linux 7.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588561&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 14:58:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 06:58:21 -0700
Subject: [Patches] [ python-Patches-588564 ] _locale library patch
Message-ID: <E17ZXW1-00020E-00@usw-sf-web4.sourceforge.net>

Patches item #588564, was opened at 2002-07-30 05:58
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588564&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Tishler (jlt63)
Assigned to: Martin v. Löwis (loewis)
Summary: _locale library patch

Initial Comment:
This patch enables setup.py to find gettext routines
when they are located in libintl instead of libc.
Although I developed this patch for Cygwin, I hope
that it can be easily updated to support other
platforms (if necessary). I tested this patch
under Cygwin and Red Hat Linux 7.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588564&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 15:04:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 07:04:41 -0700
Subject: [Patches] [ python-Patches-554807 ] Add _winreg support for Cygwin
Message-ID: <E17ZXc9-0002DW-00@usw-sf-web4.sourceforge.net>

Patches item #554807, was opened at 2002-05-11 12:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470

Category: Windows
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add _winreg support for Cygwin

Initial Comment:
This adds _winreg support to Cygwin Python without 
dependencies on other Windows modules.

For platforms in which MS_WINDOWS isn't defined, this 
reports the OSError exception instead of WindowsErr. 
It also uses the non-MBCS versions of registry access 
in this case.

Some minor changes to _winreg.c were made to clean up 
compiler warnings from GCC.

setup.py was changed to create a dynamic _winreg 
module under cygwin. There are also some earlier 
changes in the patch file to skip the import test (due 
to Cygwin fork issues), and to require libintl when 
building _locale under Cygwin.

----------------------------------------------------------------------

>Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-07-30 14:04

Message:
Logged In: YES 
user_id=329402

I plan to get back to this eventually, although held off for three 
reasons:
 - Cygwin is incorporating a registry file system that may be a 
better way to implement this
 - saw some posts about possible Unicode changes
 - Real Life (job priorities, vacation)

I probably won't get back to this until the middle of August.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 10:02

Message:
Logged In: YES 
user_id=21627

Is any kind of tweaking forthcoming?

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-05-15 13:30

Message:
Logged In: YES 
user_id=329402

It sounds like the patches need some tweaking (my testing 
had passed but was certainly limited).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-15 12:57

Message:
Logged In: YES 
user_id=21627

Yes, but you are wrong assuming that the *A functions expect
Latin-1. Instead, they expect char* encoded as CP_ACP, which
is known as "mbcs" in Python.

The *W functions do *not* expect multi-byte strings, but
Unicode strings.

Notice that _winreg also calls the *A functions, even in
MSVC builds.

So I think converting Unicode to Latin-1 is definitely
incorrect.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-05-15 12:48

Message:
Logged In: YES 
user_id=329402

Windows supplies two versions of the relevant functions. 
The Cygwin version (at least as built) uses the ANSI 
versions, as indicated by the A at the end of the symbol 
names:
  $ nm _winreg.o | grep RegQueryValue
           U _RegQueryValueA@16
           U _RegQueryValueExA@24

As opposed to the "Windows Unicode/wide-char" functions, 
which end in W and require MBCS functions to decode.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-14 22:23

Message:
Logged In: YES 
user_id=21627

Can you please explain why not using MBCS is the right thing?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554807&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 15:25:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 07:25:47 -0700
Subject: [Patches] [ python-Patches-555085 ] timeout socket implementation
Message-ID: <E17ZXwZ-0004Zg-00@usw-sf-web2.sourceforge.net>

Patches item #555085, was opened at 2002-05-12 08:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 4
Submitted By: Michael Gilfix (mgilfix)
Assigned to: Guido van Rossum (gvanrossum)
Summary: timeout socket implementation

Initial Comment:
This implements bug #457114 and implements timed socket
operations. If a timeout is set and the timeout period
elaspes before the socket operation has finished, a
socket.error exception is thrown.

This patch integrates the functionality at two levels:
the timeout capability is integrated at the C level in
socketmodule.c. Socket.py was also modified to update 
fileobject creation on a win platform to handle the
case of the underlying socket throwing an exception.
The tex documentation was also updated and a new
regression unit was provided as test_timeout.py.

----------------------------------------------------------------------

>Comment By: Michael Gilfix (mgilfix)
Date: 2002-07-30 10:25

Message:
Logged In: YES 
user_id=116038

If Guido is busy (And I'm sure he is), I'd be willing to
take a hack at the problem if you could email me privately
and provide a testing environment (No OS/2 EMX in my apt ;) ).

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 22:28

Message:
Logged In: YES 
user_id=250749

In private mail to/from Guido, it appears that the FreeBSD 
issues were in test_socket.py, and have been addressed.

I still have outstanding issues on OS/2 EMX, which I sent to 
Guido privately but will add here as soon as I can.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-07-23 16:43

Message:
Logged In: YES 
user_id=116038

Now that I'm back :)

I checked the archive and this seems to have been handled by
you. Please let me know if it isn't resolved and I can give
it a closer look.

Also, perhaps I should contact Bernie and ask him if there's
anything he hasn't gotten around to in the test_timeout that
I can off-load from him.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 13:11

Message:
Logged In: YES 
user_id=6380

The default timeout is now implemented in CVS.

There's a bug report from Andrew Macintyre (unfortunately on
python-dev) about test_socket.py failures on FreeBSD. I'll
try to keep an eye on that, so this patch *still* stays
open. Also, Bernie has promised some changes that I haven't
received yet and the details of which I don't recall (sorry
:-( ).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-07 21:47

Message:
Logged In: YES 
user_id=6380

Keeping this open as a reminder of things still to finish.

Most is in the python-dev discussion; Michael Gilfix and
Bernard Yue have offered to produce more patches.

One feature we definitely want is a way to specify a timeout
to be applied to all new sockets.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-06 17:11

Message:
Logged In: YES 
user_id=6380

Thanks for the new version! I've checked this in.  I made
considerable changes; the following is feedback but you
don't need to respond because I've addressed all these in
the checked-in code!

- Thanks for the cleanup of some non-standard formatting.
However, it's better not to do this so the diffs don't show
changes that are unrelated to the timeout patch.

- You are still importing the select module instead of
calling select() directly. I really think you should do the
latter -- the select module has an enormous overhead (it
allocates several large lists on the heap).

- Instead of explicitly testing the argument to settimeout
for being a float, int or long, you should simply call
PyFloat_AsDouble and handle the error; if someone passes
another object that implements __float__ that should be
acceptable.

- gettimeout() returns sock_timeout without checking if it
is NULL. It can be NULL when a socket object is never
initialized. E.g. I can do this:

>>> from socket import *
>>> s = socket.__new__(socket)
>>> s.gettimeout()

which gives me a segfault. There are probably other places
where this is assumed.

- I addressed the latter two issues by making sock_timeout a
double, whose value is < 0.0 when no timeout is set.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-06-05 18:23

Message:
Logged In: YES 
user_id=116038

I've addressed all the issues brought up by Guido. The 2nd
version of the patch is attached here. In this version, I've
modified test_socket.py to include tests for the _fileobject
class in socket.py that was modified by this patch.
_fileobject needed to be modified so that data would not be
lost when the underlying socket threw an expection (data was
no longer accumulated in local variables). The tests for the
_fileobject class succeed on older versions of python
(tested 2.1.3) and pass on the newer version of python.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-05-23 16:18

Message:
Logged In: YES 
user_id=6380

For a detailed review, see

http://mail.python.org/pipermail/python-dev/2002-May/024340.html

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 16:19:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 08:19:36 -0700
Subject: [Patches] [ python-Patches-557946 ] Ebcdic compliancy in stringobject source
Message-ID: <E17ZYme-0003xM-00@usw-sf-web4.sourceforge.net>

Patches item #557946, was opened at 2002-05-19 14:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=557946&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Jean-Yves MENGANT (jymen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Ebcdic compliancy in stringobject source

Initial Comment:
the printable character set test made inside
strincgobject.c is not compliant with EBCDIC
systems(OS390 or OS400)

----------------------------------------------------------------------

Comment By: coleman corrigan (coli)
Date: 2002-07-30 15:19

Message:
Logged In: YES 
user_id=586691

This is an ugly patch, 
The pcre module elegantly avoids this issue by using isprint(), 
why not do the same thing here ?.


----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-06-02 18:32

Message:
Logged In: YES 
user_id=513881

I look at the approach taken in patch #479898 , looks fine
so I 
made a quick test on OS390 EBCDIC platform just extracting
the 
SINGLE_BYTE isprint based changed which works fine on OS390
too.

It works well and is definitivelly the best approach for the
problem.

I looked also at the PRINT_MULTIBYTE_STRING approach based
on iswprint.  Looking at IBM's doc it should also work for
OS390  EBCDIC too , allthough I am not able to test it on my
OS390 box.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-28 09:58

Message:
Logged In: YES 
user_id=21627

Modifying pyconfig.h.in (alone) is a mistake: this is a
generated file, edit configure.in instead.

When producing patches, please produce a single file
containing all changes (e.g. with diff -r); this makes
processing the patch simpler.

I'm still opposed to singling-out a specific encoding;
instead, I believe that the approach taken in patch #479898
is more general and ought to solve your problem as well. Can
you please study this patch, and see whether you can make it
work on your system?

----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-05-26 16:38

Message:
Logged In: YES 
user_id=513881

The last attached diff files contains a more robust patch by
defining the HAVE_EBCDIC inside the pyconfig.h and using
this file inside the stringobject.c 

----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-05-23 11:47

Message:
Logged In: YES 
user_id=513881

I am still 100% with you on that ,my only remark here is that 
those are mainly either modules or py lib which are not part 
of python basic kernel. And the idea here is to be able to get 
a running minimal python kernel on an EBCDIC machine.

After that when the basic kernel is up in EBCDIC mode you'll 
need to deal with some module/lib EBCDIC portability and 
decide wether or not to adress them if you need to use 
them.... But the important idea here is to have the python 
kernel running in order not to be obliged to use REXX if 
you're prefering python :=) 


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-23 09:54

Message:
Logged In: YES 
user_id=21627

I believe there are a number of places where the code
assumes that 'a' .. 'z' covers all Latin letters, and only
those, e.g. pypcre.c, regexpr.c, sre.py.

----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-05-23 08:38

Message:
Logged In: YES 
user_id=513881

when porting to OS390(EBCDIC os) , the only place I found  
a bad ASCII asumption which leeds to further python's 
startup interpreter troubles is the one pointed here. When I 
fixed it I have been able to use the python interpreter kernel 
without troubles.Some modules like xmllib may make some 
ascii asumption but modules portability is a different story 
since those modules may be declared non EBCDIC 
compliant.

On the second topic using a C library function I am 100% ok 
the only question is that I am persuaded that using for 
instance the isascii XPG C function will generate more 
complex and slower code when trying to keep it in 
compliancy both with EBCDIC/ASCII targets. Having a more 
generic #define like :
#define EBCDIC inside the config.h set by ./configure when 
platform is EBCDIC is IMO the best compromise here.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-22 17:09

Message:
Logged In: YES 
user_id=21627

Is it really worth fixing this? Python assumes that the
character set of byte strings is an ASCII superset in many
places. If there is any change made here, it should be based
on C library functions, rather than on static knowledge of
the operating system.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=557946&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 16:41:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 08:41:38 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17ZZ7y-0004MJ-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 16:42:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 08:42:57 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17ZZ9F-0004Nf-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:42

Message:
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 16:40:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 08:40:04 -0700
Subject: [Patches] [ python-Patches-557946 ] Ebcdic compliancy in stringobject source
Message-ID: <E17ZZ6S-0005nU-00@usw-sf-web2.sourceforge.net>

Patches item #557946, was opened at 2002-05-19 14:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=557946&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Jean-Yves MENGANT (jymen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Ebcdic compliancy in stringobject source

Initial Comment:
the printable character set test made inside
strincgobject.c is not compliant with EBCDIC
systems(OS390 or OS400)

----------------------------------------------------------------------

>Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-07-30 15:40

Message:
Logged In: YES 
user_id=513881

>The pcre module elegantly avoids this issue by using isprint
(), >
>why not do the same thing here ?.

If you look at my answer dated 2002-06-02 , I indicated that 
the isprint is definitively the best approach to the problem , I 
made a test with it on OS390 and it works fine.

 
----------------------------------------------------------------------

Comment By: coleman corrigan (coli)
Date: 2002-07-30 15:19

Message:
Logged In: YES 
user_id=586691

This is an ugly patch, 
The pcre module elegantly avoids this issue by using isprint(), 
why not do the same thing here ?.


----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-06-02 18:32

Message:
Logged In: YES 
user_id=513881

I look at the approach taken in patch #479898 , looks fine
so I 
made a quick test on OS390 EBCDIC platform just extracting
the 
SINGLE_BYTE isprint based changed which works fine on OS390
too.

It works well and is definitivelly the best approach for the
problem.

I looked also at the PRINT_MULTIBYTE_STRING approach based
on iswprint.  Looking at IBM's doc it should also work for
OS390  EBCDIC too , allthough I am not able to test it on my
OS390 box.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-28 09:58

Message:
Logged In: YES 
user_id=21627

Modifying pyconfig.h.in (alone) is a mistake: this is a
generated file, edit configure.in instead.

When producing patches, please produce a single file
containing all changes (e.g. with diff -r); this makes
processing the patch simpler.

I'm still opposed to singling-out a specific encoding;
instead, I believe that the approach taken in patch #479898
is more general and ought to solve your problem as well. Can
you please study this patch, and see whether you can make it
work on your system?

----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-05-26 16:38

Message:
Logged In: YES 
user_id=513881

The last attached diff files contains a more robust patch by
defining the HAVE_EBCDIC inside the pyconfig.h and using
this file inside the stringobject.c 

----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-05-23 11:47

Message:
Logged In: YES 
user_id=513881

I am still 100% with you on that ,my only remark here is that 
those are mainly either modules or py lib which are not part 
of python basic kernel. And the idea here is to be able to get 
a running minimal python kernel on an EBCDIC machine.

After that when the basic kernel is up in EBCDIC mode you'll 
need to deal with some module/lib EBCDIC portability and 
decide wether or not to adress them if you need to use 
them.... But the important idea here is to have the python 
kernel running in order not to be obliged to use REXX if 
you're prefering python :=) 


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-23 09:54

Message:
Logged In: YES 
user_id=21627

I believe there are a number of places where the code
assumes that 'a' .. 'z' covers all Latin letters, and only
those, e.g. pypcre.c, regexpr.c, sre.py.

----------------------------------------------------------------------

Comment By: Jean-Yves MENGANT (jymen)
Date: 2002-05-23 08:38

Message:
Logged In: YES 
user_id=513881

when porting to OS390(EBCDIC os) , the only place I found  
a bad ASCII asumption which leeds to further python's 
startup interpreter troubles is the one pointed here. When I 
fixed it I have been able to use the python interpreter kernel 
without troubles.Some modules like xmllib may make some 
ascii asumption but modules portability is a different story 
since those modules may be declared non EBCDIC 
compliant.

On the second topic using a C library function I am 100% ok 
the only question is that I am persuaded that using for 
instance the isascii XPG C function will generate more 
complex and slower code when trying to keep it in 
compliancy both with EBCDIC/ASCII targets. Having a more 
generic #define like :
#define EBCDIC inside the config.h set by ./configure when 
platform is EBCDIC is IMO the best compromise here.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-22 17:09

Message:
Logged In: YES 
user_id=21627

Is it really worth fixing this? Python assumes that the
character set of byte strings is an ASCII superset in many
places. If there is any change made here, it should be based
on C library functions, rather than on static knowledge of
the operating system.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=557946&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 17:14:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 09:14:58 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17ZZeE-0006Dv-00@usw-sf-web5.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-30 12:14

Message:
Logged In: YES 
user_id=31435

In Kevin's company database (see Python-Dev), there were 
8 fields we could sort on.

Two fields were strongly correlated with the order of the 
records as given, and msort() was >6x faster on those.

Two fields were weakly correlated, and msort was a major 
win on those (>25% speedup).

One field had many duplicates, with a highly skewed 
distribution.  msort was better than 2x faster on that.

But the rest (phone#, #employees, address) were 
essentially randomly ordered, and msort was 
systematically a few percent slower on those.  That 
wouldn't have been remarkable, except that the percentage 
slowdown was a few times larger than the percentage by 
which msort did more comparisons than sort().

I eventually figured out the obvious:  the # of records 
wasn't an exact power of 2, and on random data msort then 
systematically arranged for the final merge to be between a 
run with a large power-of-2 size, and a run with the little bit 
left over.  That adds a bunch of compares over perfectly 
balanced merges, plus O(N) pointer copies, just to get that 
little bit in place.

The new merge4.patch repairs that as best as (I think) non-
heroically possible, quickly picking a minimum run length in 
advance that should almost never lead to a "bad" final 
merge when the data is randomly ordered.

In each of Kevin's 3 "problem sorts", msort() does fewer 
compares than sort() now, and the runtime is generally 
within a fraction of a percent.  These all-in-cache cases still 
seem to favor sort(), though, and it appears to be because 
msort() does a lot more data movement (note that 
quicksorts do no more than one swap per compare, and 
often none, while mergesorts do a copy on every 
compare).  The other 5 major-to-killer wins msort got on 
this data remain intact.

The code changes needed were tiny, but the doc file 
changed a lot more.

Note that this change has no effect on arrays with power-of-
2 sizes, so sortperf.py timings shouldn't change (and don't 
on my box).  The code change is solely to compute a good 
minimum run length before the main loop begins, and it 
happens to return the same value as was hard-coded 
before when the array has a power-of-2 size.

More testing on real data would be most welcome!  Kevin's 
data was very helpful to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:42

Message:
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 18:52:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 10:52:27 -0700
Subject: [Patches] [ python-Patches-581414 ] info reader bug
Message-ID: <E17ZbAZ-0007J2-00@usw-sf-web1.sourceforge.net>

Patches item #581414, was opened at 2002-07-14 16:13
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581414&group_id=5470

Category: Documentation
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: info reader bug

Initial Comment:
Not really a bug in the python documentation, but
somewhat annoying:

The "Matching vs. Searching" Info node is unreachable
from the Info program (but is fine in Emacs's Info
mode).  This patch seems to fix it. 
This is the only occurrence, where the info reader
fails, so probably it could be addressed in the python
docs as a workaround. Forwarded the report to the info
maintainer.

2.1.3, 2.2.1 and HEAD.

--- Doc/lib/libre.tex.orig      Wed Jul  3 16:09:07 2002 
+++ Doc/lib/libre.tex   Wed Jul  3 16:09:15 2002 
@@ -389,7 +389,7 @@ 
 refer to numbered groups. 
  
  
-\subsection{Matching vs. Searching
\label{matching-searching}} 
+\subsection{Matching vs Searching
\label{matching-searching}} 
 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org} 
  
 Python offers two different primitive operations based
on regular 
 

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-30 13:52

Message:
Logged In: YES 
user_id=3066

Checked in patch with a comment in Doc/lib/libre.tex
revisions 1.85, 1.73.6.8.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=581414&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 19:41:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 11:41:19 -0700
Subject: [Patches] [ python-Patches-555085 ] timeout socket implementation
Message-ID: <E17Zbvr-0001IU-00@usw-sf-web2.sourceforge.net>

Patches item #555085, was opened at 2002-05-12 08:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 4
Submitted By: Michael Gilfix (mgilfix)
Assigned to: Guido van Rossum (gvanrossum)
Summary: timeout socket implementation

Initial Comment:
This implements bug #457114 and implements timed socket
operations. If a timeout is set and the timeout period
elaspes before the socket operation has finished, a
socket.error exception is thrown.

This patch integrates the functionality at two levels:
the timeout capability is integrated at the C level in
socketmodule.c. Socket.py was also modified to update 
fileobject creation on a win platform to handle the
case of the underlying socket throwing an exception.
The tex documentation was also updated and a new
regression unit was provided as test_timeout.py.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-30 14:41

Message:
Logged In: YES 
user_id=6380

Michael and Andrew, if you can deal with this without my
involvement I would greatly appreciate it. ;-)

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-07-30 10:25

Message:
Logged In: YES 
user_id=116038

If Guido is busy (And I'm sure he is), I'd be willing to
take a hack at the problem if you could email me privately
and provide a testing environment (No OS/2 EMX in my apt ;) ).

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 22:28

Message:
Logged In: YES 
user_id=250749

In private mail to/from Guido, it appears that the FreeBSD 
issues were in test_socket.py, and have been addressed.

I still have outstanding issues on OS/2 EMX, which I sent to 
Guido privately but will add here as soon as I can.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-07-23 16:43

Message:
Logged In: YES 
user_id=116038

Now that I'm back :)

I checked the archive and this seems to have been handled by
you. Please let me know if it isn't resolved and I can give
it a closer look.

Also, perhaps I should contact Bernie and ask him if there's
anything he hasn't gotten around to in the test_timeout that
I can off-load from him.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-18 13:11

Message:
Logged In: YES 
user_id=6380

The default timeout is now implemented in CVS.

There's a bug report from Andrew Macintyre (unfortunately on
python-dev) about test_socket.py failures on FreeBSD. I'll
try to keep an eye on that, so this patch *still* stays
open. Also, Bernie has promised some changes that I haven't
received yet and the details of which I don't recall (sorry
:-( ).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-07 21:47

Message:
Logged In: YES 
user_id=6380

Keeping this open as a reminder of things still to finish.

Most is in the python-dev discussion; Michael Gilfix and
Bernard Yue have offered to produce more patches.

One feature we definitely want is a way to specify a timeout
to be applied to all new sockets.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-06 17:11

Message:
Logged In: YES 
user_id=6380

Thanks for the new version! I've checked this in.  I made
considerable changes; the following is feedback but you
don't need to respond because I've addressed all these in
the checked-in code!

- Thanks for the cleanup of some non-standard formatting.
However, it's better not to do this so the diffs don't show
changes that are unrelated to the timeout patch.

- You are still importing the select module instead of
calling select() directly. I really think you should do the
latter -- the select module has an enormous overhead (it
allocates several large lists on the heap).

- Instead of explicitly testing the argument to settimeout
for being a float, int or long, you should simply call
PyFloat_AsDouble and handle the error; if someone passes
another object that implements __float__ that should be
acceptable.

- gettimeout() returns sock_timeout without checking if it
is NULL. It can be NULL when a socket object is never
initialized. E.g. I can do this:

>>> from socket import *
>>> s = socket.__new__(socket)
>>> s.gettimeout()

which gives me a segfault. There are probably other places
where this is assumed.

- I addressed the latter two issues by making sock_timeout a
double, whose value is < 0.0 when no timeout is set.

----------------------------------------------------------------------

Comment By: Michael Gilfix (mgilfix)
Date: 2002-06-05 18:23

Message:
Logged In: YES 
user_id=116038

I've addressed all the issues brought up by Guido. The 2nd
version of the patch is attached here. In this version, I've
modified test_socket.py to include tests for the _fileobject
class in socket.py that was modified by this patch.
_fileobject needed to be modified so that data would not be
lost when the underlying socket threw an expection (data was
no longer accumulated in local variables). The tests for the
_fileobject class succeed on older versions of python
(tested 2.1.3) and pass on the newer version of python.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-05-23 16:18

Message:
Logged In: YES 
user_id=6380

For a detailed review, see

http://mail.python.org/pipermail/python-dev/2002-May/024340.html

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=555085&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 20:33:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 12:33:45 -0700
Subject: [Patches] [ python-Patches-588728 ] __delete__ method-wraper
Message-ID: <E17Zckb-00013K-00@usw-sf-web4.sourceforge.net>

Patches item #588728, was opened at 2002-07-30 19:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588728&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nathan Srebro (nati)
Assigned to: Nobody/Anonymous (nobody)
Summary: __delete__ method-wraper

Initial Comment:
In Python 2.2.1 (as well as in the current CVS), one
cannot access the __delete__ method of built-in
descriptors. This is particularly a problem when trying
to cooperatively subclass a built-in descriptor. Also,
defining a __delete__ method for a class (which is a
subclass of 'object'), does not have any effect unless
__set__ is also defined (it does only for old-style
classes).

This patch adds a method-wrapper for delete. This
solves the above two issues: property().__delete__ is
now properly defined, and defining a __delete__ method
now works even if __set__ is not deffined:

>>> class C(object):
...    def delx(self):
...       print "deled"
...    x = property(None,None,delx)
...
>>> a=C()
>>> C.__dict__['x'].__delete__(a)
deled
>>>
>>> 
>>> class prop(object):
... 	def __delete__(self,obj):
... 		print "deled"
... 
>>> class D(object):
... 	x = prop()
... 
>>> a = D()
>>> del a.x
deled
>>>


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588728&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 22:36:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 14:36:10 -0700
Subject: [Patches] [ python-Patches-588809 ] LDFLAGS support for build_ext.py
Message-ID: <E17Zef4-0000Z5-00@usw-sf-web3.sourceforge.net>

Patches item #588809, was opened at 2002-07-30 21:36
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470

Category: Distutils and setup.py
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Robert Weber (chipsforbrains)
Assigned to: Nobody/Anonymous (nobody)
Summary: LDFLAGS support for build_ext.py

Initial Comment:
a hack at best

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588809&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 22:54:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 14:54:44 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17Zex2-0000zi-00@usw-sf-web3.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 07:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Guido van Rossum (gvanrossum)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-30 17:54

Message:
Logged In: YES 
user_id=6380

Cool idea, but I get "unknown opcode" errors...  Keep me
posted though! (I wil ll now see any changes to this patch
item.)

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-30 05:59

Message:
Logged In: YES 
user_id=6656

Guido should see this, assuming he still isn't subscribed to
patches@python.org.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-30 05:58

Message:
Logged In: YES 
user_id=6656

I worked out why some of the code in ceval.c wasn't making
sense to me -- it didn't make sense, period.

I've also fixed a number of silly and not so silly bugs in
my patch.  I'm now 99% certain this idea can fly.  The patch
isn't *finished* but the hard bit is done, IMHO.

There are some other points to make, but I think I'll raise
them on python-dev.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 16:18

Message:
Logged In: YES 
user_id=31435

Dropping out of "vacation mode" long enough to say "mondo 
cool!" and encourage this.  Guido may not agree, but I also 
encourage you to redefine c_lnotab if it can make life easier 
and quicker here.  That subtle compression scheme has 
been the source of several nasty bugs, both in the core C 
code and in Jeremy's compiler pkg (cut 'n paste bugs 
abound here, because few people understand what's really 
needed, so flawed code gets copied with little thought).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 11:34

Message:
Logged In: YES 
user_id=6656

Uhh, the instr_[lu]b variables need to keep their values
around the loop; otherwise we might just as well call
PyCode_Addr2Line each time around.

I have another version of the patch that does that, but I
assumed the overhead of doing so was deemed too high, or
someone else would have done this by now.  It's certainly
easier.

Glad to hear it doesn't affect python -O too much.  I was
doing this away from the internet and forgot to keep a clean
copy of the source around for doing comparisons with...

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 11:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 23:09:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 15:09:36 -0700
Subject: [Patches] [ python-Patches-587993 ] alternative SET_LINENO killer
Message-ID: <E17ZfBQ-0005fl-00@usw-sf-web2.sourceforge.net>

Patches item #587993, was opened at 2002-07-29 07:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Guido van Rossum (gvanrossum)
Summary: alternative SET_LINENO killer

Initial Comment:
This patch is a proof-of-concept of another way to
remove the SET_LINENO patch (as opposed to Vladimir's
ancient one).

Instead of rewriting bytecode (ick!) we poke into the
c_lnotab to see if we've moved onto a different line.

The c_lnotab is not the most transparent of data
structures, it has to be said.

I'm not sure this patch is 100% correct -- but I think
the idea can definitely fly.  There will be some more
overhead to tracing than before, but I hope not too
much.  I haven't tested these aspects.

Comments welcome!

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-30 18:09

Message:
Logged In: YES 
user_id=6380

Never mind the errors, I hadn't done a cvs update in weeks
on the machine where I tested it. :-(

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-30 17:54

Message:
Logged In: YES 
user_id=6380

Cool idea, but I get "unknown opcode" errors...  Keep me
posted though! (I wil ll now see any changes to this patch
item.)

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-30 05:59

Message:
Logged In: YES 
user_id=6656

Guido should see this, assuming he still isn't subscribed to
patches@python.org.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-30 05:58

Message:
Logged In: YES 
user_id=6656

I worked out why some of the code in ceval.c wasn't making
sense to me -- it didn't make sense, period.

I've also fixed a number of silly and not so silly bugs in
my patch.  I'm now 99% certain this idea can fly.  The patch
isn't *finished* but the hard bit is done, IMHO.

There are some other points to make, but I think I'll raise
them on python-dev.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-29 16:18

Message:
Logged In: YES 
user_id=31435

Dropping out of "vacation mode" long enough to say "mondo 
cool!" and encourage this.  Guido may not agree, but I also 
encourage you to redefine c_lnotab if it can make life easier 
and quicker here.  That subtle compression scheme has 
been the source of several nasty bugs, both in the core C 
code and in Jeremy's compiler pkg (cut 'n paste bugs 
abound here, because few people understand what's really 
needed, so flawed code gets copied with little thought).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 11:34

Message:
Logged In: YES 
user_id=6656

Uhh, the instr_[lu]b variables need to keep their values
around the loop; otherwise we might just as well call
PyCode_Addr2Line each time around.

I have another version of the patch that does that, but I
assumed the overhead of doing so was deemed too high, or
someone else would have done this by now.  It's certainly
easier.

Glad to hear it doesn't affect python -O too much.  I was
doing this away from the internet and forgot to keep a clean
copy of the source around for doing comparisons with...

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-29 11:18

Message:
Logged In: YES 
user_id=35752

Moving the "int io, instr_ub = -1, instr_lb = 0;"
declaration and the
"io = INSTR_OFFSET();"| statement below the "if
(tstate-c_tracefunc ...)"
test gives a small speedup on my machine and is a little
neater, IMHO.

I was worried that this would slow down "python -O".  That
doesn't seem to
be the case (at least I can't measure it).  Well done.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587993&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 23:45:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 15:45:22 -0700
Subject: [Patches] [ python-Patches-588834 ] build_ext.py interprets LDFLAGS
Message-ID: <E17Zfk2-0006Dc-00@usw-sf-web2.sourceforge.net>

Patches item #588834, was opened at 2002-07-30 15:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588834&group_id=5470

Category: Distutils and setup.py
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ben Escoto (bescoto)
Assigned to: Nobody/Anonymous (nobody)
Summary: build_ext.py interprets LDFLAGS

Initial Comment:
The attached patch was given to me by Robert Weber and
I am passing it upstream.  It adds support for LDFLAGS
the way CFLAGS are currently supported.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588834&group_id=5470


From noreply@sourceforge.net  Tue Jul 30 23:47:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 15:47:21 -0700
Subject: [Patches] [ python-Patches-588834 ] build_ext.py interprets LDFLAGS
Message-ID: <E17Zflx-0006GU-00@usw-sf-web2.sourceforge.net>

Patches item #588834, was opened at 2002-07-30 15:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588834&group_id=5470

Category: Distutils and setup.py
Group: Python 2.2.x
>Status: Deleted
Resolution: None
Priority: 5
Submitted By: Ben Escoto (bescoto)
Assigned to: Nobody/Anonymous (nobody)
Summary: build_ext.py interprets LDFLAGS

Initial Comment:
The attached patch was given to me by Robert Weber and
I am passing it upstream.  It adds support for LDFLAGS
the way CFLAGS are currently supported.


----------------------------------------------------------------------

>Comment By: Ben Escoto (bescoto)
Date: 2002-07-30 15:47

Message:
Logged In: YES 
user_id=218965

sorry, my bad, I see he submitted it himself...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588834&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 02:12:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 18:12:00 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17Zi1w-0000Xe-00@usw-sf-web2.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-30 21:12

Message:
Logged In: YES 
user_id=31435

New doc file, with an intro at the start and a program at the 
end.  Turns out that merge4.patch actually reversed the 
random-array #-of-comparisons advantage samplesort had 
enjoyed:  it's now timsort that does 1-2% fewer 
comparisons on random arrays of random lengths.

See the end of the file for why samplesort does 50% more 
comparisons on average for random arrays of length two 
<wink>.

Near the end of the new Intro section at the start, I suggest 
a couple experiments people might try on boxes where 
~sort is much slower under timsort.  That remains baffling, 
but the algorithm doesn't *do* much in that case, so 
someone on a box where it flounders could surely figure out 
why.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 12:14

Message:
Logged In: YES 
user_id=31435

In Kevin's company database (see Python-Dev), there were 
8 fields we could sort on.

Two fields were strongly correlated with the order of the 
records as given, and msort() was >6x faster on those.

Two fields were weakly correlated, and msort was a major 
win on those (>25% speedup).

One field had many duplicates, with a highly skewed 
distribution.  msort was better than 2x faster on that.

But the rest (phone#, #employees, address) were 
essentially randomly ordered, and msort was 
systematically a few percent slower on those.  That 
wouldn't have been remarkable, except that the percentage 
slowdown was a few times larger than the percentage by 
which msort did more comparisons than sort().

I eventually figured out the obvious:  the # of records 
wasn't an exact power of 2, and on random data msort then 
systematically arranged for the final merge to be between a 
run with a large power-of-2 size, and a run with the little bit 
left over.  That adds a bunch of compares over perfectly 
balanced merges, plus O(N) pointer copies, just to get that 
little bit in place.

The new merge4.patch repairs that as best as (I think) non-
heroically possible, quickly picking a minimum run length in 
advance that should almost never lead to a "bad" final 
merge when the data is randomly ordered.

In each of Kevin's 3 "problem sorts", msort() does fewer 
compares than sort() now, and the runtime is generally 
within a fraction of a percent.  These all-in-cache cases still 
seem to favor sort(), though, and it appears to be because 
msort() does a lot more data movement (note that 
quicksorts do no more than one swap per compare, and 
often none, while mergesorts do a copy on every 
compare).  The other 5 major-to-killer wins msort got on 
this data remain intact.

The code changes needed were tiny, but the doc file 
changed a lot more.

Note that this change has no effect on arrays with power-of-
2 sizes, so sortperf.py timings shouldn't change (and don't 
on my box).  The code change is solely to compute a good 
minimum run length before the main loop begins, and it 
happens to return the same value as was hard-coded 
before when the array has a power-of-2 size.

More testing on real data would be most welcome!  Kevin's 
data was very helpful to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:42

Message:
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 07:02:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 30 Jul 2002 23:02:48 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17ZmZM-0003Gd-00@usw-sf-web4.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-31 02:02

Message:
Logged In: YES 
user_id=31435

~sort gets more mysterious all the time:  the mystery now 
is why it's *not* much slower everywhere!  Here are the 
exact # of compares ~sort does:

i        n     sort    msort    %ch    lg(n!)
--  ------  -------  -------  -----  --------
15   32768   130484   188720  44.63    444255
16   65536   260019   377634  45.23    954037
17  131072   555035   755476  36.11   2039137
18  262144  1107826  1511174  36.41   4340409
19  524288  2218562  3022584  36.24   9205096
20 1048576  4430616  6045418  36.45  19458756

The last column is the information-theoretic lower bound for 
sorting random arrays of this size (no comparison-based 
algorithm can do better than than on average), showing 
that sort() and msort() are both getting a lot of good out of 
the duplicates.  But sort()'s special case for equal 
elements is extremely effective on ~sort's specific data 
pattern, and msort just isn't going to get close to that (it 
does better than sort() on skewed distributions with lots of 
duplicates, though).

The only thing I can think of that could transform 
what "should be" highly significant slowdowns into highly 
significant speedups on some boxes are catastrophic 
cache effects in samplesort.  But knowing something about 
how both algorithms work <wink>, that's not screaming "oh, 
of course".

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 21:12

Message:
Logged In: YES 
user_id=31435

New doc file, with an intro at the start and a program at the 
end.  Turns out that merge4.patch actually reversed the 
random-array #-of-comparisons advantage samplesort had 
enjoyed:  it's now timsort that does 1-2% fewer 
comparisons on random arrays of random lengths.

See the end of the file for why samplesort does 50% more 
comparisons on average for random arrays of length two 
<wink>.

Near the end of the new Intro section at the start, I suggest 
a couple experiments people might try on boxes where 
~sort is much slower under timsort.  That remains baffling, 
but the algorithm doesn't *do* much in that case, so 
someone on a box where it flounders could surely figure out 
why.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 12:14

Message:
Logged In: YES 
user_id=31435

In Kevin's company database (see Python-Dev), there were 
8 fields we could sort on.

Two fields were strongly correlated with the order of the 
records as given, and msort() was >6x faster on those.

Two fields were weakly correlated, and msort was a major 
win on those (>25% speedup).

One field had many duplicates, with a highly skewed 
distribution.  msort was better than 2x faster on that.

But the rest (phone#, #employees, address) were 
essentially randomly ordered, and msort was 
systematically a few percent slower on those.  That 
wouldn't have been remarkable, except that the percentage 
slowdown was a few times larger than the percentage by 
which msort did more comparisons than sort().

I eventually figured out the obvious:  the # of records 
wasn't an exact power of 2, and on random data msort then 
systematically arranged for the final merge to be between a 
run with a large power-of-2 size, and a run with the little bit 
left over.  That adds a bunch of compares over perfectly 
balanced merges, plus O(N) pointer copies, just to get that 
little bit in place.

The new merge4.patch repairs that as best as (I think) non-
heroically possible, quickly picking a minimum run length in 
advance that should almost never lead to a "bad" final 
merge when the data is randomly ordered.

In each of Kevin's 3 "problem sorts", msort() does fewer 
compares than sort() now, and the runtime is generally 
within a fraction of a percent.  These all-in-cache cases still 
seem to favor sort(), though, and it appears to be because 
msort() does a lot more data movement (note that 
quicksorts do no more than one swap per compare, and 
often none, while mergesorts do a copy on every 
compare).  The other 5 major-to-killer wins msort got on 
this data remain intact.

The code changes needed were tiny, but the doc file 
changed a lot more.

Note that this change has no effect on arrays with power-of-
2 sizes, so sortperf.py timings shouldn't change (and don't 
on my box).  The code change is solely to compute a good 
minimum run length before the main loop begins, and it 
happens to return the same value as was hard-coded 
before when the array has a power-of-2 size.

More testing on real data would be most welcome!  Kevin's 
data was very helpful to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:42

Message:
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 08:47:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 00:47:38 -0700
Subject: [Patches] [ python-Patches-588982 ] Mindless editing, DL_EXPORT/IMPORT
Message-ID: <E17ZoCo-0006Ox-00@usw-sf-web2.sourceforge.net>

Patches item #588982, was opened at 2002-07-31 09:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588982&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Kalle Svensson (krftkndl)
Assigned to: Nobody/Anonymous (nobody)
Summary: Mindless editing, DL_EXPORT/IMPORT

Initial Comment:
In
http://mail.python.org/pipermail/python-dev/2002-July/027136.html,
Mark Hammond asked for a patch substituting
Py_MODINIT_FUNC for DL_EXPORT(void) in Modules/*.c and
PyAPI_FUNC/DATA for DL_IMPORT in Include/*.h.  Since
I'm a sucker for easy fame and fortune, here it is.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588982&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 08:49:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 00:49:08 -0700
Subject: [Patches] [ python-Patches-588982 ] Mindless editing, DL_EXPORT/IMPORT
Message-ID: <E17ZoEG-0006bR-00@usw-sf-web1.sourceforge.net>

Patches item #588982, was opened at 2002-07-31 09:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588982&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Kalle Svensson (krftkndl)
>Assigned to: Mark Hammond (mhammond)
Summary: Mindless editing, DL_EXPORT/IMPORT

Initial Comment:
In
http://mail.python.org/pipermail/python-dev/2002-July/027136.html,
Mark Hammond asked for a patch substituting
Py_MODINIT_FUNC for DL_EXPORT(void) in Modules/*.c and
PyAPI_FUNC/DATA for DL_IMPORT in Include/*.h.  Since
I'm a sucker for easy fame and fortune, here it is.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588982&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 09:28:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 01:28:05 -0700
Subject: [Patches] [ python-Patches-588728 ] __delete__ method-wraper
Message-ID: <E17Zopx-00073j-00@usw-sf-web2.sourceforge.net>

Patches item #588728, was opened at 2002-07-30 19:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588728&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nathan Srebro (nati)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: __delete__ method-wraper

Initial Comment:
In Python 2.2.1 (as well as in the current CVS), one
cannot access the __delete__ method of built-in
descriptors. This is particularly a problem when trying
to cooperatively subclass a built-in descriptor. Also,
defining a __delete__ method for a class (which is a
subclass of 'object'), does not have any effect unless
__set__ is also defined (it does only for old-style
classes).

This patch adds a method-wrapper for delete. This
solves the above two issues: property().__delete__ is
now properly defined, and defining a __delete__ method
now works even if __set__ is not deffined:

>>> class C(object):
...    def delx(self):
...       print "deled"
...    x = property(None,None,delx)
...
>>> a=C()
>>> C.__dict__['x'].__delete__(a)
deled
>>>
>>> 
>>> class prop(object):
... 	def __delete__(self,obj):
... 		print "deled"
... 
>>> class D(object):
... 	x = prop()
... 
>>> a = D()
>>> del a.x
deled
>>>


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=588728&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 10:41:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 02:41:29 -0700
Subject: [Patches] [ python-Patches-577875 ] Merge xrange() into slice()
Message-ID: <E17Zpyz-000071-00@usw-sf-web2.sourceforge.net>

Patches item #577875, was opened at 2002-07-05 18:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Oren Tirosh (orenti)
Assigned to: Nobody/Anonymous (nobody)
Summary: Merge xrange() into slice()

Initial Comment:
Changes from Raymond Hettinger's last version of this 
patch:

1. Removed #include "rangeobject.h" from Python.h

2. Changed repr to suppress None arguments so it now 
looks like the old xrange repr.

3. Added .slice(len) method that exposes the functionality 
of PySlice_GetIndicesEx.

Comment in PySlice_GetIndicesEx:
/* this is harder to get right than you might think */

:-)


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-31 09:41

Message:
Logged In: YES 
user_id=6656

Is this patch now dead?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 10:55:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 02:55:48 -0700
Subject: [Patches] [ python-Patches-561724 ] README additions for Cray T3E
Message-ID: <E17ZqCq-0002Ix-00@usw-sf-web5.sourceforge.net>

Patches item #561724, was opened at 2002-05-28 23:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561724&group_id=5470

Category: Documentation
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hadfield (hadfield)
Assigned to: Michael Hudson (mwh)
Summary: README additions for Cray T3E

Initial Comment:
As a result of my experience building Python 2.2.1 on
NIWA's Cray T3E (see comp.lang.python thread entitled
"Building Python on Cray T3E") I have added some info
to the appropriate section in README.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-31 09:55

Message:
Logged In: YES 
user_id=6656

Checked in as revision 1.150 of README.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-05-29 10:34

Message:
Logged In: YES 
user_id=6656

I think I'll wait until the sre patch gets resolved before
checking this in.  I just checked in the md5 patch, so
that's one paragraph that can get deleted (don't worry about
supplying a new patch, though).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561724&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 10:56:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 02:56:34 -0700
Subject: [Patches] [ python-Patches-561724 ] README additions for Cray T3E
Message-ID: <E17ZqDa-0002Kr-00@usw-sf-web5.sourceforge.net>

Patches item #561724, was opened at 2002-05-28 23:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561724&group_id=5470

Category: Documentation
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Mark Hadfield (hadfield)
Assigned to: Michael Hudson (mwh)
Summary: README additions for Cray T3E

Initial Comment:
As a result of my experience building Python 2.2.1 on
NIWA's Cray T3E (see comp.lang.python thread entitled
"Building Python on Cray T3E") I have added some info
to the appropriate section in README.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-31 09:55

Message:
Logged In: YES 
user_id=6656

Checked in as revision 1.150 of README.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-05-29 10:34

Message:
Logged In: YES 
user_id=6656

I think I'll wait until the sre patch gets resolved before
checking this in.  I just checked in the md5 patch, so
that's one paragraph that can get deleted (don't worry about
supplying a new patch, though).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=561724&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 12:14:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 04:14:19 -0700
Subject: [Patches] [ python-Patches-577875 ] Merge xrange() into slice()
Message-ID: <E17ZrQp-0002DL-00@usw-sf-web2.sourceforge.net>

Patches item #577875, was opened at 2002-07-05 18:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Oren Tirosh (orenti)
>Assigned to: Michael Hudson (mwh)
Summary: Merge xrange() into slice()

Initial Comment:
Changes from Raymond Hettinger's last version of this 
patch:

1. Removed #include "rangeobject.h" from Python.h

2. Changed repr to suppress None arguments so it now 
looks like the old xrange repr.

3. Added .slice(len) method that exposes the functionality 
of PySlice_GetIndicesEx.

Comment in PySlice_GetIndicesEx:
/* this is harder to get right than you might think */

:-)


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-07-31 11:14

Message:
Logged In: YES 
user_id=6656

Oren says "yes".

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-31 09:41

Message:
Logged In: YES 
user_id=6656

Is this patch now dead?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=577875&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 12:24:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 04:24:19 -0700
Subject: [Patches] [ python-Patches-554192 ] mimetypes: all extensions for a type
Message-ID: <E17ZraV-0006GB-00@usw-sf-web3.sourceforge.net>

Patches item #554192, was opened at 2002-05-09 19:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: mimetypes: all extensions for a type

Initial Comment:
This patch adds a function guess_all_extensions to 
mimetypes.py. This function returns all known 
extensions for a given type, not just the first one 
found in the types_map dictionary. guess_extension is 
still present and returns the first from the list.

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-31 13:24

Message:
Logged In: YES 
user_id=89016

OK, I'll change the patch and post the question to python-dev 
next week (I'm on vacation right now).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-30 14:34

Message:
Logged In: YES 
user_id=21627

I'm in favour of exposing it on the module level. If you are
uncertain, you might want to ask on python-dev.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-30 13:00

Message:
Logged In: YES 
user_id=89016

It *is* used in two spots: The constructor and the readfp 
method. But exposing it at the module level could make 
sense, because it is the atomic method of adding mime type 
information. So should it change the patch to expose it at the 
module level and change the LaTeX documentation 
accordingly?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-29 10:44

Message:
Logged In: YES 
user_id=21627

I can't see the point of making it private, since it is not
used inside the module. If you plan to use it, that usage
certainly is outside of the module, so the method would be
public.

If it is public, it needs to be exposed on the module level,
and it needs to be documented.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-07-29 10:23

Message:
Logged In: YES 
user_id=89016

The patch adds an inverted mapping (i.e. mapping from type
to a list of extensions). add_type simplifies adding a
type<->ext mapping to both dictionaries. If this method
should not be exposed we could make the name private.
(_add_type)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-07-28 12:30

Message:
Logged In: YES 
user_id=21627

What is the role of add_type in this patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=554192&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 16:54:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 08:54:29 -0700
Subject: [Patches] [ python-Patches-566100 ] Rationalize DL_IMPORT and DL_EXPORT
Message-ID: <E17Zvnx-0008Oj-00@usw-sf-web2.sourceforge.net>

Patches item #566100, was opened at 2002-06-08 00:14
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470

Category: Core (C code)
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Rationalize DL_IMPORT and DL_EXPORT

Initial Comment:
Tim and I agreed that DL_IMPORT/DL_EXPORT is both sucky
and broken.  We have come up with purpose oriented
macros to replace them.

PyAPI_FUNC: public Python functions
PyAPI_DATA: public Python data
PyMODINIT_FUNC: extension module init functions.

These cover all existing cases of DL_IMPORT and
DL_EXPORT in the core.

This patch simply introduces the new macros (keeping
the old ones), and changes a small amount of code to
actually use these macros.  The vast majority of the
existing Python code using DL_IMPORT/DL_EXPORT has not
been touched.

I have a patch that changes the following:

* PC/pyconfig.h - creates the new PyAPI/MODINIT macros,
but also rationalizes this header file considerably. 
All common macros between the various compilers have
been moved to a common section.  This simplifies the
header significantly.

* Include/pyport.h - creates the new PyAPI/MODINIT
macros for non windows platforms.

* Include/import.h - move to the new macros.  I picked
this header file at random, mainly to prove that the
new macros do indeed work.

* PC/_winreg.c, Modules/_sre.c, Modules/pyexpat.c -
move to the PyMODINIT_FUNC macro.

Patch tested on Windows and Linux.

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-07-31 10:54

Message:
Logged In: YES 
user_id=44345

Here's a cut at the mods to Modules/*.c and Include/*.h that Mark 
requested.  Potential issues:

* cPickle.c and cStringIO.c had some #ifndef DL_* stuff.  That probably 
needs to be checked.

* There's a DL_EXPORT in Modules/python.c.  The straightforward 
replacement didn't work, so I left it alone.  (It clearly doesn't fall into 
Mark's description of "mindless editing".  I tried to keep my mind turned 
off during this exercise. ;-)

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-19 02:23

Message:
Logged In: YES 
user_id=31435

Au contraire, thank you!

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-19 01:57

Message:
Logged In: YES 
user_id=14198

Thanks all!

Checking in configure;
/cvsroot/python/python/dist/src/configure,v  <--  configure
new revision: 1.322; previous revision: 1.321
Checking in pyconfig.h.in;
/cvsroot/python/python/dist/src/pyconfig.h.in,v  <-- 
pyconfig.h.in
new revision: 1.43; previous revision: 1.42
Checking in configure.in;
/cvsroot/python/python/dist/src/configure.in,v  <-- 
configure.in
new revision: 1.333; previous revision: 1.332
Checking in Makefile.pre.in;
/cvsroot/python/python/dist/src/Makefile.pre.in,v  <-- 
Makefile.pre.in
new revision: 1.88; previous revision: 1.87
Checking in Include/pyport.h;
/cvsroot/python/python/dist/src/Include/pyport.h,v  <-- 
pyport.h
new revision: 2.52; previous revision: 2.51
Checking in Include/import.h;
/cvsroot/python/python/dist/src/Include/import.h,v  <-- 
import.h
new revision: 2.28; previous revision: 2.27
Checking in PC/pyconfig.h;
/cvsroot/python/python/dist/src/PC/pyconfig.h,v  <--  pyconfig.h
new revision: 1.14; previous revision: 1.13
Checking in PC/_winreg.c;
/cvsroot/python/python/dist/src/PC/_winreg.c,v  <--  _winreg.c
new revision: 1.11; previous revision: 1.10
Checking in Modules/_sre.c;
/cvsroot/python/python/dist/src/Modules/_sre.c,v  <--  _sre.c
new revision: 2.82; previous revision: 2.81
Checking in Modules/pyexpat.c;
/cvsroot/python/python/dist/src/Modules/pyexpat.c,v  <-- 
pyexpat.c
new revision: 2.70; previous revision: 2.69
Checking in Python/thread.c;
/cvsroot/python/python/dist/src/Python/thread.c,v  <--  thread.c
new revision: 2.45; previous revision: 2.44
Checking in Doc/ext/extending.tex;
/cvsroot/python/python/dist/src/Doc/ext/extending.tex,v  <--
 extending.tex
new revision: 1.22; previous revision: 1.21


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-18 20:03

Message:
Logged In: YES 
user_id=33168

Add patch for configure.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-15 17:33

Message:
Logged In: YES 
user_id=33168

Sorry, I forgot about this patch.
I just tested on Linux (RedHat 7.2).
No problems, all expected tests successful.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 19:41

Message:
Logged In: YES 
user_id=14198

My patch is after Martin's so hopefully I have the macros
correct (or at least haven't regressed anything of his!)

DL_*PORT still exists, but is deprecated.  Eventually every
header will change, but for now DL_*PORT still works as before.

And yes, finding autoconf-2.5.3 for my cygwin and linux
platforms is what took 1/2 the time of getting this patch
together :)

Another report of success on Linux would be great!  To date,
I have not heard of a single person trying this patch on any
platform.

Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-07-05 13:45

Message:
Logged In: YES 
user_id=33168

I think Martin checked in the change to drop support for win16,
so some of the macros may have changed (MS_WINDOWS, MS_WIN32).
Won't all the files which use DL_*PORT (most headers in
Include) will have to change?
Michael's explanation of autoconf is what I do.  Make sure
you have version 2.53 though.
Let me know if you want me to test on linux.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-05 01:45

Message:
Logged In: YES 
user_id=14198

ok - thanks!  Attaching a new patch that works correctly
with autheader.  I'm gunna need help checking this in tho,
but one step at a time <0.1 wink>

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-04 07:28

Message:
Logged In: YES 
user_id=6656

pyconfig.h.in is a bit like configure.  when you edit
configure.in, you're expected to run autoconf to make the
configure script and check that in too.  same with
pyconfig.h.in, except that it is made by autoheader.

try running autoheader and see what happens.

(I hope someone -- Martin? -- will correct me if I have this
wrong).

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-03 20:35

Message:
Logged In: YES 
user_id=14198

I'm a little confused by pyconfig.h.in.  Can someone please
explain the process to me?  What I see is:

* reverting my pyconfig.h.in change prevents the new symbol
from appearing in pyconfig.h

* A CVS log of pyconfig.h.in shows heavy editing, with at
least 6 well-commented checkins in June alone.

So, all the evidence points that pyconfig.h.in does need
modification.  Can someone please clarify?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-02 05:16

Message:
Logged In: YES 
user_id=6656

Um, you are aware that pyconfig.h.in is auto-generated (by
autoheader)?

But if you've made edits to configure.in, you're probably ok.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-07-01 20:47

Message:
Logged In: YES 
user_id=14198

OK - here is a new ambitious patch ;)  It attempts to
rationalize all platforms, not just the PC.

* pyport.h now sets up most of the import/export magic.  It
looks for Py_ENABLE_SHARED and Py_BUILD_CORE (both new
macros) that control the behaviour.

* Py_ENABLE_SHARED has been added to pyconfig.h.in and
configure.in, so that this macro is created in pyconfig.h
whenever '--enable-shared' is passed to configure. 
Py_BUILD_CORE is passed via a "/D" option only when the core
itself is built (ie, not extensions etc)

* PC/pyconfig.h has been rationalized heavily.

* A couple of places in the core have been changed to use
the new macros - more to test that it actually works.

This has been tested on Windows using MSVC, Windows using
cygwin/gcc, and RH7 linux.  I consider it basically "done"
so please comment away.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-07-01 13:03

Message:
Logged In: YES 
user_id=38376

+1 (possibly except for the MODINIT_FUNC name...)

and yes, _sre.c is supposed to compile under earlier versions 
as well, but I can fix that later on.

</F>

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-23 22:18

Message:
Logged In: YES 
user_id=33168

I like the idea, but haven't looked at the patch.
I hope to look soon and give better feedback.
But I'll wait until after you upload the new version. :-)

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2002-06-21 00:20

Message:
Logged In: YES 
user_id=14198

Just incase anyone was going to have a look at this <wink>,
I am working on a better version by integrating some of the
cygwin autoconf work.  Just want to avoid wasting other's time

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=566100&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 21:37:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 13:37:05 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17a0DR-0005Ua-00@usw-sf-web2.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-07-31 16:37

Message:
Logged In: YES 
user_id=31435

Replaced the doc file.  The new one contains more info 
comparing msort to sort.  There's nothing more I want to do 
here, and it looks like everyone who might time this already 
did.

Assigned to Guido for pronouncement.  I recommend 
replacing list.sort() with this.  The only real downside is the 
potential for requiring 2*N temp bytes; that (and everything 
else <wink>) is discussed in the doc file.

If this is accepted, another issue is whether to *advertise* 
that this sort is stable.  Some people really want that, but 
requiring stability constrains implementations.  Another 
possibility is to give lists two sort methods, one 
guaranteed stable and the other not, where in 2.3 CPython 
both map to this code.

In no case do I want to keep both the samplesort and 
timsort implementations in the core -- one brain-busting 
sort implementation is quite enough.  This one has many 
wonderful properties the samplesort hybrid lacks.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-31 02:02

Message:
Logged In: YES 
user_id=31435

~sort gets more mysterious all the time:  the mystery now 
is why it's *not* much slower everywhere!  Here are the 
exact # of compares ~sort does:

i        n     sort    msort    %ch    lg(n!)
--  ------  -------  -------  -----  --------
15   32768   130484   188720  44.63    444255
16   65536   260019   377634  45.23    954037
17  131072   555035   755476  36.11   2039137
18  262144  1107826  1511174  36.41   4340409
19  524288  2218562  3022584  36.24   9205096
20 1048576  4430616  6045418  36.45  19458756

The last column is the information-theoretic lower bound for 
sorting random arrays of this size (no comparison-based 
algorithm can do better than than on average), showing 
that sort() and msort() are both getting a lot of good out of 
the duplicates.  But sort()'s special case for equal 
elements is extremely effective on ~sort's specific data 
pattern, and msort just isn't going to get close to that (it 
does better than sort() on skewed distributions with lots of 
duplicates, though).

The only thing I can think of that could transform 
what "should be" highly significant slowdowns into highly 
significant speedups on some boxes are catastrophic 
cache effects in samplesort.  But knowing something about 
how both algorithms work <wink>, that's not screaming "oh, 
of course".

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 21:12

Message:
Logged In: YES 
user_id=31435

New doc file, with an intro at the start and a program at the 
end.  Turns out that merge4.patch actually reversed the 
random-array #-of-comparisons advantage samplesort had 
enjoyed:  it's now timsort that does 1-2% fewer 
comparisons on random arrays of random lengths.

See the end of the file for why samplesort does 50% more 
comparisons on average for random arrays of length two 
<wink>.

Near the end of the new Intro section at the start, I suggest 
a couple experiments people might try on boxes where 
~sort is much slower under timsort.  That remains baffling, 
but the algorithm doesn't *do* much in that case, so 
someone on a box where it flounders could surely figure out 
why.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 12:14

Message:
Logged In: YES 
user_id=31435

In Kevin's company database (see Python-Dev), there were 
8 fields we could sort on.

Two fields were strongly correlated with the order of the 
records as given, and msort() was >6x faster on those.

Two fields were weakly correlated, and msort was a major 
win on those (>25% speedup).

One field had many duplicates, with a highly skewed 
distribution.  msort was better than 2x faster on that.

But the rest (phone#, #employees, address) were 
essentially randomly ordered, and msort was 
systematically a few percent slower on those.  That 
wouldn't have been remarkable, except that the percentage 
slowdown was a few times larger than the percentage by 
which msort did more comparisons than sort().

I eventually figured out the obvious:  the # of records 
wasn't an exact power of 2, and on random data msort then 
systematically arranged for the final merge to be between a 
run with a large power-of-2 size, and a run with the little bit 
left over.  That adds a bunch of compares over perfectly 
balanced merges, plus O(N) pointer copies, just to get that 
little bit in place.

The new merge4.patch repairs that as best as (I think) non-
heroically possible, quickly picking a minimum run length in 
advance that should almost never lead to a "bad" final 
merge when the data is randomly ordered.

In each of Kevin's 3 "problem sorts", msort() does fewer 
compares than sort() now, and the runtime is generally 
within a fraction of a percent.  These all-in-cache cases still 
seem to favor sort(), though, and it appears to be because 
msort() does a lot more data movement (note that 
quicksorts do no more than one swap per compare, and 
often none, while mergesorts do a copy on every 
compare).  The other 5 major-to-killer wins msort got on 
this data remain intact.

The code changes needed were tiny, but the doc file 
changed a lot more.

Note that this change has no effect on arrays with power-of-
2 sizes, so sortperf.py timings shouldn't change (and don't 
on my box).  The code change is solely to compute a good 
minimum run length before the main loop begins, and it 
happens to return the same value as was hard-coded 
before when the array has a power-of-2 size.

More testing on real data would be most welcome!  Kevin's 
data was very helpful to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:42

Message:
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470


From noreply@sourceforge.net  Wed Jul 31 21:45:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 31 Jul 2002 13:45:39 -0700
Subject: [Patches] [ python-Patches-587076 ] Adaptive stable mergesort
Message-ID: <E17a0Lj-0005bs-00@usw-sf-web2.sourceforge.net>

Patches item #587076, was opened at 2002-07-26 11:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tim Peters (tim_one)
>Assigned to: Tim Peters (tim_one)
Summary: Adaptive stable mergesort

Initial Comment:
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-31 16:45

Message:
Logged In: YES 
user_id=6380

1. Go for it.

2. Advertise it as an implementation feature.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-31 16:37

Message:
Logged In: YES 
user_id=31435

Replaced the doc file.  The new one contains more info 
comparing msort to sort.  There's nothing more I want to do 
here, and it looks like everyone who might time this already 
did.

Assigned to Guido for pronouncement.  I recommend 
replacing list.sort() with this.  The only real downside is the 
potential for requiring 2*N temp bytes; that (and everything 
else <wink>) is discussed in the doc file.

If this is accepted, another issue is whether to *advertise* 
that this sort is stable.  Some people really want that, but 
requiring stability constrains implementations.  Another 
possibility is to give lists two sort methods, one 
guaranteed stable and the other not, where in 2.3 CPython 
both map to this code.

In no case do I want to keep both the samplesort and 
timsort implementations in the core -- one brain-busting 
sort implementation is quite enough.  This one has many 
wonderful properties the samplesort hybrid lacks.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-31 02:02

Message:
Logged In: YES 
user_id=31435

~sort gets more mysterious all the time:  the mystery now 
is why it's *not* much slower everywhere!  Here are the 
exact # of compares ~sort does:

i        n     sort    msort    %ch    lg(n!)
--  ------  -------  -------  -----  --------
15   32768   130484   188720  44.63    444255
16   65536   260019   377634  45.23    954037
17  131072   555035   755476  36.11   2039137
18  262144  1107826  1511174  36.41   4340409
19  524288  2218562  3022584  36.24   9205096
20 1048576  4430616  6045418  36.45  19458756

The last column is the information-theoretic lower bound for 
sorting random arrays of this size (no comparison-based 
algorithm can do better than than on average), showing 
that sort() and msort() are both getting a lot of good out of 
the duplicates.  But sort()'s special case for equal 
elements is extremely effective on ~sort's specific data 
pattern, and msort just isn't going to get close to that (it 
does better than sort() on skewed distributions with lots of 
duplicates, though).

The only thing I can think of that could transform 
what "should be" highly significant slowdowns into highly 
significant speedups on some boxes are catastrophic 
cache effects in samplesort.  But knowing something about 
how both algorithms work <wink>, that's not screaming "oh, 
of course".

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 21:12

Message:
Logged In: YES 
user_id=31435

New doc file, with an intro at the start and a program at the 
end.  Turns out that merge4.patch actually reversed the 
random-array #-of-comparisons advantage samplesort had 
enjoyed:  it's now timsort that does 1-2% fewer 
comparisons on random arrays of random lengths.

See the end of the file for why samplesort does 50% more 
comparisons on average for random arrays of length two 
<wink>.

Near the end of the new Intro section at the start, I suggest 
a couple experiments people might try on boxes where 
~sort is much slower under timsort.  That remains baffling, 
but the algorithm doesn't *do* much in that case, so 
someone on a box where it flounders could surely figure out 
why.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 12:14

Message:
Logged In: YES 
user_id=31435

In Kevin's company database (see Python-Dev), there were 
8 fields we could sort on.

Two fields were strongly correlated with the order of the 
records as given, and msort() was >6x faster on those.

Two fields were weakly correlated, and msort was a major 
win on those (>25% speedup).

One field had many duplicates, with a highly skewed 
distribution.  msort was better than 2x faster on that.

But the rest (phone#, #employees, address) were 
essentially randomly ordered, and msort was 
systematically a few percent slower on those.  That 
wouldn't have been remarkable, except that the percentage 
slowdown was a few times larger than the percentage by 
which msort did more comparisons than sort().

I eventually figured out the obvious:  the # of records 
wasn't an exact power of 2, and on random data msort then 
systematically arranged for the final merge to be between a 
run with a large power-of-2 size, and a run with the little bit 
left over.  That adds a bunch of compares over perfectly 
balanced merges, plus O(N) pointer copies, just to get that 
little bit in place.

The new merge4.patch repairs that as best as (I think) non-
heroically possible, quickly picking a minimum run length in 
advance that should almost never lead to a "bad" final 
merge when the data is randomly ordered.

In each of Kevin's 3 "problem sorts", msort() does fewer 
compares than sort() now, and the runtime is generally 
within a fraction of a percent.  These all-in-cache cases still 
seem to favor sort(), though, and it appears to be because 
msort() does a lot more data movement (note that 
quicksorts do no more than one swap per compare, and 
often none, while mergesorts do a copy on every 
compare).  The other 5 major-to-killer wins msort got on 
this data remain intact.

The code changes needed were tiny, but the doc file 
changed a lot more.

Note that this change has no effect on arrays with power-of-
2 sizes, so sortperf.py timings shouldn't change (and don't 
on my box).  The code change is solely to compute a good 
minimum run length before the main loop begins, and it 
happens to return the same value as was hard-coded 
before when the array has a power-of-2 size.

More testing on real data would be most welcome!  Kevin's 
data was very helpful to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:42

Message:
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-30 11:41

Message:
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-29 09:12

Message:
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-29 07:53

Message:
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:28

Message:
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 15:00

Message:
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
>From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:14

Message:
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:09

Message:
Logged In: YES 
user_id=31435

Adding new doc file.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-28 14:08

Message:
Logged In: YES 
user_id=31435

Deleting old doc file.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 07:23

Message:
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2002-07-27 04:20

Message:
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 21:24

Message:
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 16:38

Message:
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-07-26 15:50

Message:
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 14:54

Message:
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 13:52

Message:
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84

----------------------------------------------------------------------

Comment By: Kevin Jacobs (jacobs99)
Date: 2002-07-26 12:54

Message:
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-26 12:30

Message:
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-07-26 12:23

Message:
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=587076&group_id=5470