From noreply@sourceforge.net  Fri Mar  1 07:21:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Feb 2002 23:21:00 -0800
Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements
Message-ID: <E16ghLg-00009r-00@usw-sf-web3.sourceforge.net>

Patches item #520694, was opened at 2002-02-20 22:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470

Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 3
Submitted By: Jason Orendorff (jorend)
Assigned to: Martin v. Löwis (loewis)
Summary: arraymodule.c improvements

Initial Comment:
This patch makes brings the array module a little
more up-to-date.

There are two changes:

1. Modernize the array type, memory management,
   and so forth.  As a result, the array()
   builtin is no longer a function but a type.
   array.array is array.ArrayType.
   Also, it can now be subclassed in Python.

2. Add a new typecode 'u', for Unicode
   characters.

The patch includes changes to test/test_array.py
to test the new features.

I would like to make a further change: add an
arrayobject.h include file, and provide some
array operations there, giving them names like
PyArray_Check(), PyArray_GetItem(), and
PyArray_GET_DATA().  Is such a change likely
to find favor?


----------------------------------------------------------------------

>Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 07:21

Message:
Logged In: YES 
user_id=18139

Guido:  In hindsight, yes it would have been much
easier.

...This version adds __iadd__ and __imul__.
There's also a separate documentation patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 22:46

Message:
Logged In: YES 
user_id=6380

Cool. I wonder if it wouldn't have been easier to first
submit and commit the easy changes, and then the unicode
addition separately?

Anyway, I presume that Martin will commit this when it's
ready.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-27 03:15

Message:
Logged In: YES 
user_id=18139

Getting there.  This version has tounicode() and
fromunicode(), and a better repr() for type 'u' arrays.
Also, array.typecode and array.itemsize are now listed
under tp_getset; they're attribute descriptors and
they show up in help(array).  (Neat!)

Next, documentation; then __iadd__ and __imul__.
But not tonight.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-25 12:24

Message:
Logged In: YES 
user_id=21627

Removal of __members__ is fine, then - but you do need to
fill out an appropriate tp_members instead, listing
"typecode" and "itemsize".

Adding __iadd__ and __imul__ is fine; the equivalent feature
for lists has not caused complaints, either, and anybody
using *= on an array probably would consider it a bug that
it isn't in-place.

Please add documentation changes as well; I currently have
Doc/lib/libarray.tex
 \lineiii{'d'}{double}{8}
+\lineiii{'u'}{Py_UNICODE}{2}
 \end{tableiii}

Misc/NEWS
- array.array is now a type object. A new format character
'u' indicates Py_UNICODE arrays.


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-25 00:29

Message:
Logged In: YES 
user_id=18139

Martin writes:  "There is a flaw in the extension of
arrays to Unicode: There is no easy way to get back
the Unicode string."

Boy, are you right.  There should be
array.tounicode() and array.fromunicode()
methods that only work on type 'u' arrays.

...I also want to fix repr for type 'u' arrays.
Instead of "array.array('u', [u'x', u'y', u'z'])" it should
say "array.array('u', u'xyz')".

...I would also implement __iadd__ and __imul__
(as list implements them), but this would be a
semantic change!  Thoughts?

Count on a new patch tomorrow.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-24 21:38

Message:
Logged In: YES 
user_id=31435

Without looking at any details, __members__ and __methods__ 
are deprecated starting with 2.2; the type/class 
unification PEPs aim at moving the universe toward 
supporting and using the class-like introspection API 
instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-24 15:56

Message:
Logged In: YES 
user_id=21627

There is a flaw in the extension of arrays to Unicode: There
is no easy way to get back the Unicode string. You have to use

u"".join(arr.tolist())

This is slightly annoying, since there is it is the only
case where it is not possible to get back the original
constructor arguments.

Also, what is the rationale for removing __members__?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-22 13:39

Message:
Logged In: YES 
user_id=38388

How about simplifying the whole setup altogether and 
add arrays as standard Python types (ie. put the code
in Objects/ and add the new include file to Includes/).

About the inter-module C API export: I'll write up a PEP
about this which will hopefully result in a new standard
support mechanism for this in Python. (BTW, the
approach I used in _ssl/_socket does use PyCObjects)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-22 13:25

Message:
Logged In: YES 
user_id=21627

With the rationale given, I'm now in favour of all parts of
the patch.

As for exposing the API, you need to address MAL's concerns:
PyArray_* won't be available to other extension modules,
instead, you need to do expose them through a C object.

However, I recommend *not* to follow the approach taken in
socket/ssl; I agree with Tim's concerns here. Instead, the
approach taken by cStringIO (via cStringIO.cStringIO_API) is
much better (i.e. put the burden of using the API onto any
importer, and out of Python proper).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-21 08:40

Message:
Logged In: YES 
user_id=38388

About the Unicode bit: if "u" maps to Py_UNICODE I for one 
don't have any objections. The internal encoding is
available in lots of places, so that argument doesn't
count and I'm sure it can be put to some good use
for fast manipulation of large Unicode strings.

I very much like the new exposure of the type at C level;
however I don't understand how you would use it without
adding the complete module to the libpythonx.x.a (unless
you add some sort of inter-module C API import mechanism
like the one I added to _socket and _ssl) ?!


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 02:03

Message:
Logged In: YES 
user_id=18139

> What is the rationale for expanding PyObject_VAR_HEAD?
> It doesn't seem to achieve anything.

It didn't make sense for array to be a VAR_HEAD type.

VAR_HEAD types are variable-size: the last member
defined in the struct for such a type is an array of
length 1, and type->item_size is nonzero.  See
e.g. PyType_GenericAlloc(), and how it decides whether
to call PyObject_INIT or PyObject_VAR_INIT: It checks
type->item_size.

The new arraymodule.c calls PyType_GenericAlloc; the
old one didn't.  So a change seemed warranted.  Since
Arraytype has item_size == 0, it seemed most consistent
to make it a non-VAR type and initialize the ob_size
field myself.

I'm pretty sure I got the right interpretation of this;
but if not, someone wiser in the ways of Python will
speak up.  :)

(While I was looking at this, I noticed this:
http://sourceforge.net/tracker/index.php?
func=detail&aid=520768&group_id=5470&atid=305470)


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 01:15

Message:
Logged In: YES 
user_id=18139

> I don't like the Unicode part of it at all.

Well, I'm not attatched to it.  It's very easy
to subtract it from the patch.

> What can you do with this feature?

The same sort of thing you might do with an array
of type 'c'.  For example, change individual
characters of a (Unicode) string and then run a
(Unicode) re.match on it.

> It seems to unfairly prefer a specific Unicode encoding,
> without explaining what that encoding is, and without a
> clear use case why this encoding is desirable.

Well, why should array('h', '\x00\xff\xaa\xbb')
be allowed?  Why is that encoding preferable to any
other particular encoding of short ints?  Easy:
it's the encoding of the C compiler where Python was
built.  For 'u' arrays, the encoding used is just the
encoding that Python uses internally.

However, it's not intended to be used in any situation
where encode()/decode() would be appropriate.  I never
even thought about that possibility when I wrote it.

The behavior of a 'u' array is intended to be more
like this:  Suppose A = array('u', ustr).  Then:
    len(A) == len(ustr)
    A[0] == ustr[0]
    A[1] == ustr[1]
    ...

That is, a 'u' array is an array of Unicode characters.
Encoding is not an issue, any more than with the
built-in unicode type.

(If ustr is a non-Unicode string, then the behavior
is different -- more in line with what 'b', 'h', 'i',
and the others do.)

If your concern is that Python currently "hides" its
internal encoding, and the 'u' array exposes this
unnecessarily, then consider these two examples that
don't involve arrays:

>>> x = u'\U00012345'  # One Unicode codepoint...
>>> len(x)
2             # hmm.
>>> x[0]
u'\ud808'     # aha.  UTF-16.
>>> x[1]
u'\udf45'

>>> str(buffer(u'abc'))   # Example two.
'a\x00b\x00c\x00'

> It also seems to overlap with the Unicode object's
> .encode method, which is much more general.

Wow.  Well, that wasn't my intent.

It is intended, rather, to offer parity with 'c'.
Java has byte[], short[], int[], long[], float[],
double[], and char[]... Python doesn't currently have
char[].  Shouldn't it?


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-20 23:02

Message:
Logged In: YES 
user_id=21627

What is the rationale for expanding PyObject_VAR_HEAD? It
doesn't seem to achieve anything.

I don't like the Unicode part of it at all. What can you do
with this feature? It seems to unfairly prefer a specific
Unicode encoding, without explaining what that encoding is,
and without a clear use case why this encoding is desirable.
It also seems to overlap with the Unicode object's .encode
method, which is much more general.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 07:25:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Feb 2002 23:25:04 -0800
Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements
Message-ID: <E16ghPc-0006TN-00@usw-sf-web1.sourceforge.net>

Patches item #520694, was opened at 2002-02-20 22:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470

Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 3
Submitted By: Jason Orendorff (jorend)
Assigned to: Martin v. Löwis (loewis)
Summary: arraymodule.c improvements

Initial Comment:
This patch makes brings the array module a little
more up-to-date.

There are two changes:

1. Modernize the array type, memory management,
   and so forth.  As a result, the array()
   builtin is no longer a function but a type.
   array.array is array.ArrayType.
   Also, it can now be subclassed in Python.

2. Add a new typecode 'u', for Unicode
   characters.

The patch includes changes to test/test_array.py
to test the new features.

I would like to make a further change: add an
arrayobject.h include file, and provide some
array operations there, giving them names like
PyArray_Check(), PyArray_GetItem(), and
PyArray_GET_DATA().  Is such a change likely
to find favor?


----------------------------------------------------------------------

>Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 07:25

Message:
Logged In: YES 
user_id=18139

Documentation patch.  Please check my TEX; I'm not used to 
it yet, and I can't get the Python docs to build on my 
Windows box, probably because one of the tools isn't 
installed properly, or something.  So there's no way for me 
to check that it's correct, yet.

(...If you let this sit for a moment I'll eventually check 
this for myself on the Linux box, but it'll be a little 
while.  Thanks.)

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 07:21

Message:
Logged In: YES 
user_id=18139

Guido:  In hindsight, yes it would have been much
easier.

...This version adds __iadd__ and __imul__.
There's also a separate documentation patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 22:46

Message:
Logged In: YES 
user_id=6380

Cool. I wonder if it wouldn't have been easier to first
submit and commit the easy changes, and then the unicode
addition separately?

Anyway, I presume that Martin will commit this when it's
ready.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-27 03:15

Message:
Logged In: YES 
user_id=18139

Getting there.  This version has tounicode() and
fromunicode(), and a better repr() for type 'u' arrays.
Also, array.typecode and array.itemsize are now listed
under tp_getset; they're attribute descriptors and
they show up in help(array).  (Neat!)

Next, documentation; then __iadd__ and __imul__.
But not tonight.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-25 12:24

Message:
Logged In: YES 
user_id=21627

Removal of __members__ is fine, then - but you do need to
fill out an appropriate tp_members instead, listing
"typecode" and "itemsize".

Adding __iadd__ and __imul__ is fine; the equivalent feature
for lists has not caused complaints, either, and anybody
using *= on an array probably would consider it a bug that
it isn't in-place.

Please add documentation changes as well; I currently have
Doc/lib/libarray.tex
 \lineiii{'d'}{double}{8}
+\lineiii{'u'}{Py_UNICODE}{2}
 \end{tableiii}

Misc/NEWS
- array.array is now a type object. A new format character
'u' indicates Py_UNICODE arrays.


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-25 00:29

Message:
Logged In: YES 
user_id=18139

Martin writes:  "There is a flaw in the extension of
arrays to Unicode: There is no easy way to get back
the Unicode string."

Boy, are you right.  There should be
array.tounicode() and array.fromunicode()
methods that only work on type 'u' arrays.

...I also want to fix repr for type 'u' arrays.
Instead of "array.array('u', [u'x', u'y', u'z'])" it should
say "array.array('u', u'xyz')".

...I would also implement __iadd__ and __imul__
(as list implements them), but this would be a
semantic change!  Thoughts?

Count on a new patch tomorrow.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-24 21:38

Message:
Logged In: YES 
user_id=31435

Without looking at any details, __members__ and __methods__ 
are deprecated starting with 2.2; the type/class 
unification PEPs aim at moving the universe toward 
supporting and using the class-like introspection API 
instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-24 15:56

Message:
Logged In: YES 
user_id=21627

There is a flaw in the extension of arrays to Unicode: There
is no easy way to get back the Unicode string. You have to use

u"".join(arr.tolist())

This is slightly annoying, since there is it is the only
case where it is not possible to get back the original
constructor arguments.

Also, what is the rationale for removing __members__?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-22 13:39

Message:
Logged In: YES 
user_id=38388

How about simplifying the whole setup altogether and 
add arrays as standard Python types (ie. put the code
in Objects/ and add the new include file to Includes/).

About the inter-module C API export: I'll write up a PEP
about this which will hopefully result in a new standard
support mechanism for this in Python. (BTW, the
approach I used in _ssl/_socket does use PyCObjects)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-22 13:25

Message:
Logged In: YES 
user_id=21627

With the rationale given, I'm now in favour of all parts of
the patch.

As for exposing the API, you need to address MAL's concerns:
PyArray_* won't be available to other extension modules,
instead, you need to do expose them through a C object.

However, I recommend *not* to follow the approach taken in
socket/ssl; I agree with Tim's concerns here. Instead, the
approach taken by cStringIO (via cStringIO.cStringIO_API) is
much better (i.e. put the burden of using the API onto any
importer, and out of Python proper).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-21 08:40

Message:
Logged In: YES 
user_id=38388

About the Unicode bit: if "u" maps to Py_UNICODE I for one 
don't have any objections. The internal encoding is
available in lots of places, so that argument doesn't
count and I'm sure it can be put to some good use
for fast manipulation of large Unicode strings.

I very much like the new exposure of the type at C level;
however I don't understand how you would use it without
adding the complete module to the libpythonx.x.a (unless
you add some sort of inter-module C API import mechanism
like the one I added to _socket and _ssl) ?!


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 02:03

Message:
Logged In: YES 
user_id=18139

> What is the rationale for expanding PyObject_VAR_HEAD?
> It doesn't seem to achieve anything.

It didn't make sense for array to be a VAR_HEAD type.

VAR_HEAD types are variable-size: the last member
defined in the struct for such a type is an array of
length 1, and type->item_size is nonzero.  See
e.g. PyType_GenericAlloc(), and how it decides whether
to call PyObject_INIT or PyObject_VAR_INIT: It checks
type->item_size.

The new arraymodule.c calls PyType_GenericAlloc; the
old one didn't.  So a change seemed warranted.  Since
Arraytype has item_size == 0, it seemed most consistent
to make it a non-VAR type and initialize the ob_size
field myself.

I'm pretty sure I got the right interpretation of this;
but if not, someone wiser in the ways of Python will
speak up.  :)

(While I was looking at this, I noticed this:
http://sourceforge.net/tracker/index.php?
func=detail&aid=520768&group_id=5470&atid=305470)


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 01:15

Message:
Logged In: YES 
user_id=18139

> I don't like the Unicode part of it at all.

Well, I'm not attatched to it.  It's very easy
to subtract it from the patch.

> What can you do with this feature?

The same sort of thing you might do with an array
of type 'c'.  For example, change individual
characters of a (Unicode) string and then run a
(Unicode) re.match on it.

> It seems to unfairly prefer a specific Unicode encoding,
> without explaining what that encoding is, and without a
> clear use case why this encoding is desirable.

Well, why should array('h', '\x00\xff\xaa\xbb')
be allowed?  Why is that encoding preferable to any
other particular encoding of short ints?  Easy:
it's the encoding of the C compiler where Python was
built.  For 'u' arrays, the encoding used is just the
encoding that Python uses internally.

However, it's not intended to be used in any situation
where encode()/decode() would be appropriate.  I never
even thought about that possibility when I wrote it.

The behavior of a 'u' array is intended to be more
like this:  Suppose A = array('u', ustr).  Then:
    len(A) == len(ustr)
    A[0] == ustr[0]
    A[1] == ustr[1]
    ...

That is, a 'u' array is an array of Unicode characters.
Encoding is not an issue, any more than with the
built-in unicode type.

(If ustr is a non-Unicode string, then the behavior
is different -- more in line with what 'b', 'h', 'i',
and the others do.)

If your concern is that Python currently "hides" its
internal encoding, and the 'u' array exposes this
unnecessarily, then consider these two examples that
don't involve arrays:

>>> x = u'\U00012345'  # One Unicode codepoint...
>>> len(x)
2             # hmm.
>>> x[0]
u'\ud808'     # aha.  UTF-16.
>>> x[1]
u'\udf45'

>>> str(buffer(u'abc'))   # Example two.
'a\x00b\x00c\x00'

> It also seems to overlap with the Unicode object's
> .encode method, which is much more general.

Wow.  Well, that wasn't my intent.

It is intended, rather, to offer parity with 'c'.
Java has byte[], short[], int[], long[], float[],
double[], and char[]... Python doesn't currently have
char[].  Shouldn't it?


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-20 23:02

Message:
Logged In: YES 
user_id=21627

What is the rationale for expanding PyObject_VAR_HEAD? It
doesn't seem to achieve anything.

I don't like the Unicode part of it at all. What can you do
with this feature? It seems to unfairly prefer a specific
Unicode encoding, without explaining what that encoding is,
and without a clear use case why this encoding is desirable.
It also seems to overlap with the Unicode object's .encode
method, which is much more general.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 07:59:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Feb 2002 23:59:03 -0800
Subject: [Patches] [ python-Patches-523268 ] pwd.getpw* returns enhanced tuple.
Message-ID: <E16ghwV-0006rc-00@usw-sf-web1.sourceforge.net>

Patches item #523268, was opened at 2002-02-27 05:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470

Category: Modules
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Sean Reifschneider (jafo)
Assigned to: Nobody/Anonymous (nobody)
Summary: pwd.getpw* returns enhanced tuple.

Initial Comment:
This patch against the current CVS implements the
enhanced tuple return types for pwd.getpw*().  This
makes the return similar to time.localtime() and os.stat().

Includes changes to the documents as well.

----------------------------------------------------------------------

Comment By: Quinn Dunkan (quinn_dunkan)
Date: 2002-03-01 07:59

Message:
Logged In: YES 
user_id=429749

Looks good to me.  I'll go zap mine now.


----------------------------------------------------------------------

Comment By: Sean Reifschneider (jafo)
Date: 2002-02-28 09:20

Message:
Logged In: YES 
user_id=81797

I've taken a look at Quinn's patch, and have created a new
version which I believe is the combination of the two.  It
also includes doc strings for the structs themselves,
documentation of the grp module, and removes a case where a
failure can cause a memory leak.

I'll ask Quinn to review this patch.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-27 22:21

Message:
Logged In: YES 
user_id=21627

Please coordinate with Quinn Dunkan (patch #522027). It
seems his patch fills out some character strings where you
use NULL.

Ideally, you'd both come up with a revised version of the
patch, and withdraw the other one.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 08:01:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 00:01:44 -0800
Subject: [Patches] [ python-Patches-522027 ] pwdmodule and grpmodule use structs
Message-ID: <E16ghz6-0000ef-00@usw-sf-web3.sourceforge.net>

Patches item #522027, was opened at 2002-02-24 11:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522027&group_id=5470

Category: Modules
Group: None
>Status: Deleted
>Resolution: Duplicate
Priority: 5
Submitted By: Quinn Dunkan (quinn_dunkan)
Assigned to: Nobody/Anonymous (nobody)
Summary: pwdmodule and grpmodule use structs

Initial Comment:
Here are a few patches to make pwd and grp use
structs, like time.struct_time

----------------------------------------------------------------------

Comment By: Sean Reifschneider (jafo)
Date: 2002-02-28 09:22

Message:
Logged In: YES 
user_id=81797

I have created a new patch which is the union of our two
patches, plus a bit.  Please review it and if you have any
comments let me know.

Thanks,
Sean

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-27 22:20

Message:
Logged In: YES 
user_id=21627

For the pwd part, please coordinate with Sean Reifschneider
and patch #523268. I like the documentation changes in that
patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522027&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 08:21:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 00:21:39 -0800
Subject: [Patches] [ python-Patches-520483 ] Make IDLE OutputWindow handle Unicode
Message-ID: <E16giIN-00078o-00@usw-sf-web1.sourceforge.net>

Patches item #520483, was opened at 2002-02-20 15:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520483&group_id=5470

Category: IDLE
Group: Python 2.2.x
Status: Closed
Resolution: Accepted
Priority: 7
Submitted By: Jason Orendorff (jorend)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Make IDLE OutputWindow handle Unicode

Initial Comment:
This one-line patch makes OutputWindow handle
Unicode correctly.  For example,

  >>> print u'\xbfQu\xe9 pas\xf3?'

In 2.2 this throws a UnicodeError,
not because of any problem with Unicode
handling in either Python or Tk, but
because IDLE does str(s) on the Unicode
string.

I just took out the call to str().


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 09:21

Message:
Logged In: YES 
user_id=21627

Thanks, your comments are indeed helpful.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:59

Message:
Logged In: YES 
user_id=6380

I think when I first wrote that code, Tkinter didn't yet
support Unicode. I think I felt that write() shouldn't be
called with anything besides a string, but I didn't want to
put in an explicit type check, and yet I didn't want to pass
non-strings to Tcl because it treats certain types special.
For example, in the patched IDLE, try
sys.stdout.write((1,2,(3,4))), or try
sys.stdout.write(None).

But I think it's no big deal, and I approve of the change.
Consequently, I'm closing this bug report again. I've merged
this into 2.2.1. Should I do anything else?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-24 01:43

Message:
Logged In: YES 
user_id=31435

We're never too busy for people we love, Jason <wink>.

Reopened, changed category to IDLE, changed group to Python 
2.2.x, boosted priority, and assigned to Guido.

The str() call has been there since Guido first checked 
this in, and its purpose isn't apparent to me either.  
Maybe Guido remembers.  Guido?

I'm not worried at all that someone might be calling it 
with a non-stringish argument -- it's supplying a "file-
like object" interface, and .write() requires a stringish 
argument.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-24 01:19

Message:
Logged In: YES 
user_id=18139

Submitted to idlefork.

I'm too shy to bother "one of the major IDLE authors."  It 
would be nice to have in 2.2.1, but I know the folks at 
PythonLabs are busy...

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-23 23:42

Message:
Logged In: YES 
user_id=21627

Ok, committed as OutputWindow 1.6. I strongly recommend to
submit this to idlefork as well. If want this patch to
appear in Python 2.2.1, you should get a comment from one of
the major IDLE authors or contributors.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 00:27

Message:
Logged In: YES 
user_id=18139

> Isn't this too simplistic? I guess there was a reason for
> the str call: could it ever happen that somebody passes
> something else (beyond byte and Unicode strings)?

I searched for write() in the idle directory and got
48 hits in 7 files.  Then I checked them all.  In every
case, either write() is called with a string, or the
argument is passed unchanged from another function that
contains the word "write()".

As for code outside IDLE, I'd be extra surprised if
anyone calls obj.write(x) with x being something other
than a string.  Ordinary file objects don't accept it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-20 23:56

Message:
Logged In: YES 
user_id=21627

Isn't this too simplistic? I guess there was a reason for
the str call: could it ever happen that somebody passes
something else (beyond byte and Unicode strings)?

Also, I wonder whether IDLE patches need to go to idlefork
(sf.net/projects/idlefork) first.

Apart from this comments, I think your patch is quite right.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520483&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 08:32:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 00:32:35 -0800
Subject: [Patches] [ python-Patches-520062 ] Support IPv6 with VC.NET
Message-ID: <E16giSx-0003E4-00@usw-sf-web2.sourceforge.net>

Patches item #520062, was opened at 2002-02-19 18:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520062&group_id=5470

Category: Windows
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Martin v. Löwis (loewis)
Summary: Support IPv6 with VC.NET

Initial Comment:
This patch enables IPv6 support based on Winsock2 on 
Microsoft C 13 and later. Due to the implementation 
strategy used in the SDK headers, the resulting 
_socket.pyd will not require additional shared 
libraries, but it will instead locale the symbols 
dynamically, and fall back to a default 
implementation if none are found.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 09:32

Message:
Logged In: YES 
user_id=21627

Committed as socketmodule.c 1.209; socketmodule.h 1.5;
PC/pyconfig.h .7 (after changing the comment).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-01 00:39

Message:
Logged In: YES 
user_id=31435

Back to Martin.  No problems compiling or running on my 
Win98SE + VC6 box, incl. test_socketserver.py.  The only 
thing I object to is the "//" comment.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-01 00:20

Message:
Logged In: YES 
user_id=31435

The "//" style comment in pyconfig.h should change to /**/ 
style (I don't care that MSVC accepts either -- not 
everyone looking at this file uses MSVC).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-01 00:17

Message:
Logged In: YES 
user_id=31435

Since Martin submitted the patch, I think we can assume he 
already agrees with the basic premise <wink>.  Reassigned 
to me.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 00:08

Message:
Logged In: YES 
user_id=6380

We'll have a hard time to test this, since I don't think
anyone I know with a Windows build environment is set up for
IPv6 yet.

I'm assigning to Martin since he's the IPv6 master, to see
if he agrees with the basic premises (and that it doesn't
break anything on Unix -- it's a pretty small patch so that
seems unlikely). Then Martin should probably assign it to
Tim, so Tim can see if at least it doesn't break anything on
various flavors of Windows we have lying around. Then it can
be alpha and beta tested to see if it doesn't break anything
else, and the original author can test if the installer we
distribute actually does the right thing for him.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520062&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 10:30:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 02:30:13 -0800
Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements
Message-ID: <E16gkIn-0002VB-00@usw-sf-web3.sourceforge.net>

Patches item #520694, was opened at 2002-02-20 23:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470

Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 3
Submitted By: Jason Orendorff (jorend)
Assigned to: Martin v. Löwis (loewis)
Summary: arraymodule.c improvements

Initial Comment:
This patch makes brings the array module a little
more up-to-date.

There are two changes:

1. Modernize the array type, memory management,
   and so forth.  As a result, the array()
   builtin is no longer a function but a type.
   array.array is array.ArrayType.
   Also, it can now be subclassed in Python.

2. Add a new typecode 'u', for Unicode
   characters.

The patch includes changes to test/test_array.py
to test the new features.

I would like to make a further change: add an
arrayobject.h include file, and provide some
array operations there, giving them names like
PyArray_Check(), PyArray_GetItem(), and
PyArray_GET_DATA().  Is such a change likely
to find favor?


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 11:30

Message:
Logged In: YES 
user_id=21627

Thanks again for the patches; committed as
libarray.tex 1.32
test_array.py 1.14
NEWS 1.358
arraymodule.c 2.67

I added Py_USING_UNICODE before checking this in.

There is one open issue: printing Unicode arrays on the
interpreter prompt will still repr arrays as lists of
Unicode objects; this is because arrays implement tp_print?
Is that necessary? My proposal: just remove the tp_print
implementation.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 08:25

Message:
Logged In: YES 
user_id=18139

Documentation patch.  Please check my TEX; I'm not used to 
it yet, and I can't get the Python docs to build on my 
Windows box, probably because one of the tools isn't 
installed properly, or something.  So there's no way for me 
to check that it's correct, yet.

(...If you let this sit for a moment I'll eventually check 
this for myself on the Linux box, but it'll be a little 
while.  Thanks.)

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 08:21

Message:
Logged In: YES 
user_id=18139

Guido:  In hindsight, yes it would have been much
easier.

...This version adds __iadd__ and __imul__.
There's also a separate documentation patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:46

Message:
Logged In: YES 
user_id=6380

Cool. I wonder if it wouldn't have been easier to first
submit and commit the easy changes, and then the unicode
addition separately?

Anyway, I presume that Martin will commit this when it's
ready.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-27 04:15

Message:
Logged In: YES 
user_id=18139

Getting there.  This version has tounicode() and
fromunicode(), and a better repr() for type 'u' arrays.
Also, array.typecode and array.itemsize are now listed
under tp_getset; they're attribute descriptors and
they show up in help(array).  (Neat!)

Next, documentation; then __iadd__ and __imul__.
But not tonight.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-25 13:24

Message:
Logged In: YES 
user_id=21627

Removal of __members__ is fine, then - but you do need to
fill out an appropriate tp_members instead, listing
"typecode" and "itemsize".

Adding __iadd__ and __imul__ is fine; the equivalent feature
for lists has not caused complaints, either, and anybody
using *= on an array probably would consider it a bug that
it isn't in-place.

Please add documentation changes as well; I currently have
Doc/lib/libarray.tex
 \lineiii{'d'}{double}{8}
+\lineiii{'u'}{Py_UNICODE}{2}
 \end{tableiii}

Misc/NEWS
- array.array is now a type object. A new format character
'u' indicates Py_UNICODE arrays.


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-25 01:29

Message:
Logged In: YES 
user_id=18139

Martin writes:  "There is a flaw in the extension of
arrays to Unicode: There is no easy way to get back
the Unicode string."

Boy, are you right.  There should be
array.tounicode() and array.fromunicode()
methods that only work on type 'u' arrays.

...I also want to fix repr for type 'u' arrays.
Instead of "array.array('u', [u'x', u'y', u'z'])" it should
say "array.array('u', u'xyz')".

...I would also implement __iadd__ and __imul__
(as list implements them), but this would be a
semantic change!  Thoughts?

Count on a new patch tomorrow.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-24 22:38

Message:
Logged In: YES 
user_id=31435

Without looking at any details, __members__ and __methods__ 
are deprecated starting with 2.2; the type/class 
unification PEPs aim at moving the universe toward 
supporting and using the class-like introspection API 
instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-24 16:56

Message:
Logged In: YES 
user_id=21627

There is a flaw in the extension of arrays to Unicode: There
is no easy way to get back the Unicode string. You have to use

u"".join(arr.tolist())

This is slightly annoying, since there is it is the only
case where it is not possible to get back the original
constructor arguments.

Also, what is the rationale for removing __members__?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-22 14:39

Message:
Logged In: YES 
user_id=38388

How about simplifying the whole setup altogether and 
add arrays as standard Python types (ie. put the code
in Objects/ and add the new include file to Includes/).

About the inter-module C API export: I'll write up a PEP
about this which will hopefully result in a new standard
support mechanism for this in Python. (BTW, the
approach I used in _ssl/_socket does use PyCObjects)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-22 14:25

Message:
Logged In: YES 
user_id=21627

With the rationale given, I'm now in favour of all parts of
the patch.

As for exposing the API, you need to address MAL's concerns:
PyArray_* won't be available to other extension modules,
instead, you need to do expose them through a C object.

However, I recommend *not* to follow the approach taken in
socket/ssl; I agree with Tim's concerns here. Instead, the
approach taken by cStringIO (via cStringIO.cStringIO_API) is
much better (i.e. put the burden of using the API onto any
importer, and out of Python proper).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-21 09:40

Message:
Logged In: YES 
user_id=38388

About the Unicode bit: if "u" maps to Py_UNICODE I for one 
don't have any objections. The internal encoding is
available in lots of places, so that argument doesn't
count and I'm sure it can be put to some good use
for fast manipulation of large Unicode strings.

I very much like the new exposure of the type at C level;
however I don't understand how you would use it without
adding the complete module to the libpythonx.x.a (unless
you add some sort of inter-module C API import mechanism
like the one I added to _socket and _ssl) ?!


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 03:03

Message:
Logged In: YES 
user_id=18139

> What is the rationale for expanding PyObject_VAR_HEAD?
> It doesn't seem to achieve anything.

It didn't make sense for array to be a VAR_HEAD type.

VAR_HEAD types are variable-size: the last member
defined in the struct for such a type is an array of
length 1, and type->item_size is nonzero.  See
e.g. PyType_GenericAlloc(), and how it decides whether
to call PyObject_INIT or PyObject_VAR_INIT: It checks
type->item_size.

The new arraymodule.c calls PyType_GenericAlloc; the
old one didn't.  So a change seemed warranted.  Since
Arraytype has item_size == 0, it seemed most consistent
to make it a non-VAR type and initialize the ob_size
field myself.

I'm pretty sure I got the right interpretation of this;
but if not, someone wiser in the ways of Python will
speak up.  :)

(While I was looking at this, I noticed this:
http://sourceforge.net/tracker/index.php?
func=detail&aid=520768&group_id=5470&atid=305470)


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 02:15

Message:
Logged In: YES 
user_id=18139

> I don't like the Unicode part of it at all.

Well, I'm not attatched to it.  It's very easy
to subtract it from the patch.

> What can you do with this feature?

The same sort of thing you might do with an array
of type 'c'.  For example, change individual
characters of a (Unicode) string and then run a
(Unicode) re.match on it.

> It seems to unfairly prefer a specific Unicode encoding,
> without explaining what that encoding is, and without a
> clear use case why this encoding is desirable.

Well, why should array('h', '\x00\xff\xaa\xbb')
be allowed?  Why is that encoding preferable to any
other particular encoding of short ints?  Easy:
it's the encoding of the C compiler where Python was
built.  For 'u' arrays, the encoding used is just the
encoding that Python uses internally.

However, it's not intended to be used in any situation
where encode()/decode() would be appropriate.  I never
even thought about that possibility when I wrote it.

The behavior of a 'u' array is intended to be more
like this:  Suppose A = array('u', ustr).  Then:
    len(A) == len(ustr)
    A[0] == ustr[0]
    A[1] == ustr[1]
    ...

That is, a 'u' array is an array of Unicode characters.
Encoding is not an issue, any more than with the
built-in unicode type.

(If ustr is a non-Unicode string, then the behavior
is different -- more in line with what 'b', 'h', 'i',
and the others do.)

If your concern is that Python currently "hides" its
internal encoding, and the 'u' array exposes this
unnecessarily, then consider these two examples that
don't involve arrays:

>>> x = u'\U00012345'  # One Unicode codepoint...
>>> len(x)
2             # hmm.
>>> x[0]
u'\ud808'     # aha.  UTF-16.
>>> x[1]
u'\udf45'

>>> str(buffer(u'abc'))   # Example two.
'a\x00b\x00c\x00'

> It also seems to overlap with the Unicode object's
> .encode method, which is much more general.

Wow.  Well, that wasn't my intent.

It is intended, rather, to offer parity with 'c'.
Java has byte[], short[], int[], long[], float[],
double[], and char[]... Python doesn't currently have
char[].  Shouldn't it?


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-21 00:02

Message:
Logged In: YES 
user_id=21627

What is the rationale for expanding PyObject_VAR_HEAD? It
doesn't seem to achieve anything.

I don't like the Unicode part of it at all. What can you do
with this feature? It seems to unfairly prefer a specific
Unicode encoding, without explaining what that encoding is,
and without a clear use case why this encoding is desirable.
It also seems to overlap with the Unicode object's .encode
method, which is much more general.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 10:48:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 02:48:12 -0800
Subject: [Patches] [ python-Patches-523268 ] pwd.getpw* returns enhanced tuple.
Message-ID: <E16gkaC-0004uL-00@usw-sf-web2.sourceforge.net>

Patches item #523268, was opened at 2002-02-27 06:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470

Category: Modules
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Sean Reifschneider (jafo)
Assigned to: Nobody/Anonymous (nobody)
Summary: pwd.getpw* returns enhanced tuple.

Initial Comment:
This patch against the current CVS implements the
enhanced tuple return types for pwd.getpw*().  This
makes the return similar to time.localtime() and os.stat().

Includes changes to the documents as well.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 11:48

Message:
Logged In: YES 
user_id=21627

Thanks, committed as libgrp.tex 1.16, grpmodule.c 2.17,
pwdmodule.c 1.27, libpwd.tex 1.14, NEWS 1.359.

----------------------------------------------------------------------

Comment By: Quinn Dunkan (quinn_dunkan)
Date: 2002-03-01 08:59

Message:
Logged In: YES 
user_id=429749

Looks good to me.  I'll go zap mine now.


----------------------------------------------------------------------

Comment By: Sean Reifschneider (jafo)
Date: 2002-02-28 10:20

Message:
Logged In: YES 
user_id=81797

I've taken a look at Quinn's patch, and have created a new
version which I believe is the combination of the two.  It
also includes doc strings for the structs themselves,
documentation of the grp module, and removes a case where a
failure can cause a memory leak.

I'll ask Quinn to review this patch.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-27 23:21

Message:
Logged In: YES 
user_id=21627

Please coordinate with Quinn Dunkan (patch #522027). It
seems his patch fills out some character strings where you
use NULL.

Ideally, you'd both come up with a revised version of the
patch, and withdraw the other one.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 11:34:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 03:34:23 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16glIt-0005U3-00@usw-sf-web2.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 15:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
Assigned to: Guido van Rossum (gvanrossum)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

>Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 12:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 13:18:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 05:18:23 -0800
Subject: [Patches] [ python-Patches-524008 ] pysvr portability bug on new POSIX hosts
Message-ID: <E16gmvX-0006cV-00@usw-sf-web2.sourceforge.net>

Patches item #524008, was opened at 2002-02-28 20:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524008&group_id=5470

Category: Demos and tools
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Paul Eggert (eggert)
Assigned to: Nobody/Anonymous (nobody)
Summary: pysvr portability bug on new POSIX hosts

Initial Comment:
The new POSIX standard is now official (IEEE
Std 1003.1-2001), and it has removed support
for the obsolescent syntax "tail +2l".
You are now supposed to use "tail -n +2"
instead.  As a result of this change,
the pysvr demo fails on my Solaris 8 host if I am
using GNU textutils 2.0.21 and have defined
_POSIX2_VERSION=200112 and
POSIXLY_CORRECT=true in my environment.

Here is a patch, relative to Python 2.2.

2002-02-28  Paul Eggert  <eggert@twinsun.com>

	* Demo/pysvr/pysvr.c (ps):
	Don't use "tail +2l", as POSIX 1003.1-2001
	no longer allows this.
	Use "sed 1d" instead, as it's more portable.

===================================================================
RCS file: Demo/pysvr/pysvr.c,v
retrieving revision 2.2
retrieving revision 2.2.0.1
diff -pu -r2.2 -r2.2.0.1
--- Demo/pysvr/pysvr.c	2001/11/28 20:27:42	2.2
+++ Demo/pysvr/pysvr.c	2002/02/28 19:02:54	2.2.0.1
@@ -365,6 +365,6 @@ ps(void)
 {
 	char buffer[100];
 	PyOS_snprintf(buffer, sizeof(buffer),
-		      "ps -l -p %d </dev/null | tail +2l\n",
getpid());
+		      "ps -l -p %d </dev/null | sed 1d\n",
getpid());
 	system(buffer);
 }


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 14:18

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Fixed in pysvr.c 1.11.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524008&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 13:46:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 05:46:02 -0800
Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL
Message-ID: <E16gnMI-0006uw-00@usw-sf-web2.sourceforge.net>

Patches item #524327, was opened at 2002-03-01 14:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Tino Lange (tinolange)
Assigned to: Nobody/Anonymous (nobody)
Summary: imaplib.py and SSL

Initial Comment:
Hallo!

Our company has decided to allow only SSL connections 
to the e-mailbox from outside. So I needed a SSL 
capable "imaplib.py" to run my mailwatcher-scripts 
from home.

Thanks to the socket.ssl() in recent Pythons it was 
nearly no problem to derive an IMAP4_SSL-class from 
the existing IMAP4-class in Python's standard library.

Maybe you want to look over the very small additions 
that were necessary to implement the IMAP-over-SSL-
functionality and add it as a part of the next 
official "imaplib.py"?

Here's the context diff from the most recent CVS 
version (1.43). It works fine for me this way and it's 
only a few straight-forward lines of code.

Maybe I could contribute a bit to the Python project 
with this patch?

Best regards

Tino Lange


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 14:05:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 06:05:30 -0800
Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL
Message-ID: <E16gnf8-0007Bm-00@usw-sf-web2.sourceforge.net>

Patches item #524327, was opened at 2002-03-01 13:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Tino Lange (tinolange)
>Assigned to: Piers Lauder (pierslauder)
Summary: imaplib.py and SSL

Initial Comment:
Hallo!

Our company has decided to allow only SSL connections 
to the e-mailbox from outside. So I needed a SSL 
capable "imaplib.py" to run my mailwatcher-scripts 
from home.

Thanks to the socket.ssl() in recent Pythons it was 
nearly no problem to derive an IMAP4_SSL-class from 
the existing IMAP4-class in Python's standard library.

Maybe you want to look over the very small additions 
that were necessary to implement the IMAP-over-SSL-
functionality and add it as a part of the next 
official "imaplib.py"?

Here's the context diff from the most recent CVS 
version (1.43). It works fine for me this way and it's 
only a few straight-forward lines of code.

Maybe I could contribute a bit to the Python project 
with this patch?

Best regards

Tino Lange


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 14:34:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 06:34:28 -0800
Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response
Message-ID: <E16go7A-0007ba-00@usw-sf-web2.sourceforge.net>

Patches item #517256, was opened at 2002-02-14 00:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470

Category: Library (Lib)
>Group: Python 2.1.2
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: James Rucker (jamesrucker)
Assigned to: Fredrik Lundh (effbot)
Summary: poor performance in xmlrpc response

Initial Comment:
xmlrpclib.Transport.parse_response() (called from 
xmlrpclib.Transport.request()) is exhibiting poor 
performance - approx. 10x slower than expected.

I investigated based on using a simple app that sent a 
msg to a server, where all the server did was return 
the message back to the caller.  From profiling, it 
became clear that the return trip was taken 10x the 
time consumed by the client->server trip, and that the 
time was spent getting things across the wire.

parse_response() reads from a file object created via 
socket.makefile(), and as a result exhibits 
performance that is about an order of magnitude worse 
than what it would be if socket.recv() were used on 
the socket.  The patch provided uses socket.recv() 
when possible, to improve performance.

The patch provided is against revision 1.15.  Its use 
provides performance for the return trip that is more 
or less equivalent to that of the forward trip.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-03-01 15:34

Message:
Logged In: YES 
user_id=38376

looks fine to me.  I'll merge it with SLAB changes,
and will check it into the 2.3 codebase asap.

(we probably should try to figure out why makefile
causes a 10x slowdown too -- xmlrpclib isn't exactly
the only client library reading from a buffered
socket)

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 00:23

Message:
Logged In: YES 
user_id=6380

Fredrik, does this look OK to you?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 14:40:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 06:40:36 -0800
Subject: [Patches] [ python-Patches-514641 ] Negative ob_size of LongObjects
Message-ID: <E16goD6-0007h6-00@usw-sf-web2.sourceforge.net>

Patches item #514641, was opened at 2002-02-07 22:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470

Category: Core (C code)
>Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Naofumi Honda (naofumi-h)
Assigned to: Nobody/Anonymous (nobody)
Summary: Negative ob_size of LongObjects

Initial Comment:
I found the following bugs due to the negative ob_size
of LongObjects representing the negative values.

1) The access of attribute "__dict__" causes
   panic.

class A(long):
    pass

x = A(-1)
x.__dict__
==> core dump!

2) pickle neglects the sign of LongObjects

import pickle
class A(long):
  pass

x = A(-1)
pickle.dumps(x)

==>  a string containing 1L (not -1L) !!!

The patch will resolve the above problems.

Naofumi Honda

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-02-08 08:21

Message:
Logged In: YES 
user_id=33168

There is a bug report for this item:  #506679.

https://sourceforge.net/tracker/index.php?func=detail&aid=506679&group_id=5470&atid=105470

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 16:14:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 08:14:21 -0800
Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response
Message-ID: <E16gpfp-00050J-00@usw-sf-web1.sourceforge.net>

Patches item #517256, was opened at 2002-02-13 18:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: James Rucker (jamesrucker)
Assigned to: Fredrik Lundh (effbot)
Summary: poor performance in xmlrpc response

Initial Comment:
xmlrpclib.Transport.parse_response() (called from 
xmlrpclib.Transport.request()) is exhibiting poor 
performance - approx. 10x slower than expected.

I investigated based on using a simple app that sent a 
msg to a server, where all the server did was return 
the message back to the caller.  From profiling, it 
became clear that the return trip was taken 10x the 
time consumed by the client->server trip, and that the 
time was spent getting things across the wire.

parse_response() reads from a file object created via 
socket.makefile(), and as a result exhibits 
performance that is about an order of magnitude worse 
than what it would be if socket.recv() were used on 
the socket.  The patch provided uses socket.recv() 
when possible, to improve performance.

The patch provided is against revision 1.15.  Its use 
provides performance for the return trip that is more 
or less equivalent to that of the forward trip.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 11:14

Message:
Logged In: YES 
user_id=6380

My guess makefile() isn't buffering properly. This has been
a long-standing problem on Windows; I'm not sure if it's an
issue on Unix.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-01 09:34

Message:
Logged In: YES 
user_id=38376

looks fine to me.  I'll merge it with SLAB changes,
and will check it into the 2.3 codebase asap.

(we probably should try to figure out why makefile
causes a 10x slowdown too -- xmlrpclib isn't exactly
the only client library reading from a buffered
socket)

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 18:23

Message:
Logged In: YES 
user_id=6380

Fredrik, does this look OK to you?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 21:31:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 13:31:06 -0800
Subject: [Patches] [ python-Patches-517245 ] fix for mpzmodule.c
Message-ID: <E16gucM-0002G5-00@usw-sf-web3.sourceforge.net>

Patches item #517245, was opened at 2002-02-13 18:18
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517245&group_id=5470

Category: Modules
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Marc Recht (marc)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: fix for mpzmodule.c

Initial Comment:
This a one line to get mpzmodule compiled with GMP
version >= 2.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 16:31

Message:
Logged In: YES 
user_id=6380

Thanks, Fixed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517245&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 21:35:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 13:35:25 -0800
Subject: [Patches] [ python-Patches-523241 ] MimeWriter must use CRLF instead of LF
Message-ID: <E16gugX-0002KE-00@usw-sf-web3.sourceforge.net>

Patches item #523241, was opened at 2002-02-26 21:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Clarence Gardner (cgardner)
Assigned to: Nobody/Anonymous (nobody)
Summary: MimeWriter must use CRLF instead of LF

Initial Comment:
In all of the output that MimeWriter does (headers and
boundaries), a CRLF must be written rather than just LF.
(CRLF at the end of the header, and at the beginning and
end of the boundaries.)

Here's hoping I'm doing this right :)

----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 16:35

Message:
Logged In: YES 
user_id=12800

Guido is correct, and while I personally consider MIMEWriter
obsolete <wink>, I have taken the same approach with the
email package.  IMO, both modules should read and write
native line endings.  It is the responsibility of smtplib
(in the case of sending the msg over the wire) or the MTA's
program/file filter (in the case of receiving the msg from
the wire) to translate from RFC 2822 line endings to native
line endings, and vice versa.

I recommend this patch be rejected.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 18:03

Message:
Logged In: YES 
user_id=6380

Thanks for bearing with us. SF may be the worst possible
tool, but I don't know anything better. :-(

Having seen the patch, I disagree with your intent. This
issue has come up before.

While the MIME standard stipulates that newlines are
represented as CRLF on the wire, we're not writing files on
the wire. We're using the local line ending convention
consistently whenever we read or write email, and some other
entity is responsible for translating these to the proper
CRLF.

Maybe you can come up with a fix to the documentation that
explains this policy instead?


----------------------------------------------------------------------

Comment By: Clarence Gardner (cgardner)
Date: 2002-02-28 17:41

Message:
Logged In: YES 
user_id=409146

Actually, it's a SourceForge bug :(
I did check the box and attach the file, but it gave me a
"Bad Filename"
error. I did it again, removing the quotes that my browser
put around the
filename, and it said "You already submitted this! Don't
doubleclick!"

So maybe not *everybody's* submissions that lack a file came
from
an idiot :)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 17:32

Message:
Logged In: YES 
user_id=6380

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 21:42:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 13:42:16 -0800
Subject: [Patches] [ python-Patches-523241 ] MimeWriter must use CRLF instead of LF
Message-ID: <E16gunA-0002Py-00@usw-sf-web3.sourceforge.net>

Patches item #523241, was opened at 2002-02-26 21:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Clarence Gardner (cgardner)
>Assigned to: Barry Warsaw (bwarsaw)
Summary: MimeWriter must use CRLF instead of LF

Initial Comment:
In all of the output that MimeWriter does (headers and
boundaries), a CRLF must be written rather than just LF.
(CRLF at the end of the header, and at the beginning and
end of the boundaries.)

Here's hoping I'm doing this right :)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 16:35

Message:
Logged In: YES 
user_id=12800

Guido is correct, and while I personally consider MIMEWriter
obsolete <wink>, I have taken the same approach with the
email package.  IMO, both modules should read and write
native line endings.  It is the responsibility of smtplib
(in the case of sending the msg over the wire) or the MTA's
program/file filter (in the case of receiving the msg from
the wire) to translate from RFC 2822 line endings to native
line endings, and vice versa.

I recommend this patch be rejected.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 18:03

Message:
Logged In: YES 
user_id=6380

Thanks for bearing with us. SF may be the worst possible
tool, but I don't know anything better. :-(

Having seen the patch, I disagree with your intent. This
issue has come up before.

While the MIME standard stipulates that newlines are
represented as CRLF on the wire, we're not writing files on
the wire. We're using the local line ending convention
consistently whenever we read or write email, and some other
entity is responsible for translating these to the proper
CRLF.

Maybe you can come up with a fix to the documentation that
explains this policy instead?


----------------------------------------------------------------------

Comment By: Clarence Gardner (cgardner)
Date: 2002-02-28 17:41

Message:
Logged In: YES 
user_id=409146

Actually, it's a SourceForge bug :(
I did check the box and attach the file, but it gave me a
"Bad Filename"
error. I did it again, removing the quotes that my browser
put around the
filename, and it said "You already submitted this! Don't
doubleclick!"

So maybe not *everybody's* submissions that lack a file came
from
an idiot :)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 17:32

Message:
Logged In: YES 
user_id=6380

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 21:42:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 13:42:23 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16gunH-0000x2-00@usw-sf-web1.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 09:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
Assigned to: Guido van Rossum (gvanrossum)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 16:42

Message:
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 06:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 17:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:25:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:25:23 -0800
Subject: [Patches] [ python-Patches-514641 ] Negative ob_size of LongObjects
Message-ID: <E16gvSt-00030q-00@usw-sf-web3.sourceforge.net>

Patches item #514641, was opened at 2002-02-07 22:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Naofumi Honda (naofumi-h)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: Negative ob_size of LongObjects

Initial Comment:
I found the following bugs due to the negative ob_size
of LongObjects representing the negative values.

1) The access of attribute "__dict__" causes
   panic.

class A(long):
    pass

x = A(-1)
x.__dict__
==> core dump!

2) pickle neglects the sign of LongObjects

import pickle
class A(long):
  pass

x = A(-1)
pickle.dumps(x)

==>  a string containing 1L (not -1L) !!!

The patch will resolve the above problems.

Naofumi Honda

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:25

Message:
Logged In: YES 
user_id=6380

Thanks, good catch!

I've applied roughly your patch, and added a test.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-02-08 08:21

Message:
Logged In: YES 
user_id=33168

There is a bug report for this item:  #506679.

https://sourceforge.net/tracker/index.php?func=detail&aid=506679&group_id=5470&atid=105470

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:36:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:36:51 -0800
Subject: [Patches] [ python-Patches-515015 ] inspect.py raise exception if code not found
Message-ID: <E16gvdz-0005l2-00@usw-sf-web2.sourceforge.net>

Patches item #515015, was opened at 2002-02-08 17:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: inspect.py raise exception if code not found

Initial Comment:
there is a comment which says the suffixes should
be sorted by length, but there is no comparison
function.

this patch adds a comparison (lambda).

also, there are two functions which are documented
to return IOError if there are problems, but
if the function reaches the end, there were
no raises.  This patch adds raise IOErrors.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:36

Message:
Logged In: YES 
user_id=6380

Neal, can you check this is and mark as bugfix?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-02-09 09:16

Message:
Logged In: YES 
user_id=33168

Sorry, I saw the map/lambda above, but misread the code.
Attached is a new file (just contains the 2 raises).

I really need to add a test for this as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-08 18:10

Message:
Logged In: YES 
user_id=31435

Please remove the lambda trick from the patch.  The comment 
is explaining why the negation of the length is the first 
element of the tuples being sorted (that's what guarantees 
the longest suffix is checked first in case of overlap).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:40:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:40:02 -0800
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E16gvh4-0005ng-00@usw-sf-web2.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 16:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
Assigned to: Nobody/Anonymous (nobody)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:42:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:42:02 -0800
Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs
Message-ID: <E16gvj0-0005pO-00@usw-sf-web2.sourceforge.net>

Patches item #514997, was opened at 2002-02-08 16:22
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
>Priority: 3
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: remove extra SET_LINENOs

Initial Comment:
This patch removes consecutive SET_LINENOs.
The patch fixes test_hotspot, but does not fix
a failure in inspect.  I wasn't sure what
was the problem was or why SET_LINENO would
matter for inspect.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:42

Message:
Logged In: YES 
user_id=6380

Can you find someone interested in answering the inspect
question? Otherwise this patch is stalled...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:43:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:43:47 -0800
Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior
Message-ID: <E16gvkh-0005qx-00@usw-sf-web2.sourceforge.net>

Patches item #514662, was opened at 2002-02-07 23:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Naofumi Honda (naofumi-h)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: On the update_slot() behavior

Initial Comment:
Inherited method __getitem__ of list type
in the new subclass is unexpectedly slow. 

For example,

x = list([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m2.390s 

class nlist(list):
        pass

x = nlist([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m7.040s
about 3times slower!!!

The reason is:
for the __getitem__ attribute, there are
two slotdefs in typeobject.c
(one for the mapping type, and
the other for the sequence type).

In the creation of new_type of list type, 
fixup_slot_dispatchers() and update_slot() functions
in typeobject.c allocate the functions
to both sq_item and mp_subscript slots
(the mp_subscript slot had originally no function,
  because the list type is a sequence type),
 and it's an unexpected allocation for the mapping
 slot since the descriptor type of __getitem__
 is now WrapperType for the sequence operations.

If you will trace x[1] using gdb,
you will find that in PyObject_GetItem() 
m->mp_subscript = slot_mp_subscript 
is called instead of a sequece operation
because mp_subscript slot was allocated by
fixup_slot_dispatchers().
In the slot_mp_subscirpt(),
call_method(self, "__getitem__", ...) is invoked,
and turn out to call a wrapper descriptors for
the sq_item.

As a result, the method of list type finally called,
but it needs many unexpected function calls.

I will fix the behavior of fixup_slot_dispachers()
and update_slot() as follows:

Only the case where 
*) two or more slotdefs have the same attribute
   name where at most one corresponding slot
   has a non null pointer
*) the descriptor type of the attribute is
   WrapperType,

these functions will allocate the only one
function to the apropriate slot.

The other case, the behavior not changed
to keep compatiblity!
(in particular, considering the case where
  user override methods exist!)

The following patch also includes speed up routines
to find the slotdef duplications,
but it's not essential!


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:45:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:45:41 -0800
Subject: [Patches] [ python-Patches-514628 ] bug in pydoc on python 2.2 release
Message-ID: <E16gvmX-0001mB-00@usw-sf-web1.sourceforge.net>

Patches item #514628, was opened at 2002-02-07 21:09
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Raj Kunjithapadam (mmaster25)
>Assigned to: Tim Peters (tim_one)
Summary: bug in pydoc on python 2.2 release

Initial Comment:
pydoc has a bug when trying to generate html doc
more importantly it has bug in the method
writedoc()

attached is my fix.
Here is the diff between my fix and the regular dist

1338c1338
< def writedoc(thing, forceload=0):
---
> def writedoc(key, forceload=0):
1340,1346c1340,1343
<     object = thing
<     if type(thing) is type(''):
<         try:
<             object = locate(thing, forceload)
<         except ErrorDuringImport, value:
<             print value
<             return
---
>     try:
>         object = locate(key, forceload)
>     except ErrorDuringImport, value:
>         print value
1351c1348
<             file = open(thing.__name__ + '.html', 'w')
---
>             file = open(key + '.html', 'w')
1354c1351
<             print 'wrote', thing.__name__ + '.html'
---
>             print 'wrote', key + '.html'
1356c1353
<             print 'no Python documentation found for
%s' % repr(thing)
---
>             print 'no Python documentation found for
%s' % repr(key)

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:45

Message:
Logged In: YES 
user_id=6380

assigned to Tim; this may be Ping's terrain but Ping is
typically not responsive.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 22:58:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 14:58:11 -0800
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E16gvyd-00063H-00@usw-sf-web2.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 16:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
Assigned to: Nobody/Anonymous (nobody)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Mihai Ibanescu (misa)
Date: 2002-03-01 17:58

Message:
Logged In: YES 
user_id=205865

I will add documentation and show the intended usage.
urllib* doesn't deal with proxying over SSL (using CONNECT
instead of GET/POST). urllib* also use the compatibility
classes, HTTP/HTTPS, instead of HTTPConnection (this is not
an argument by itself).
Thanks for the suggestion.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 23:00:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 15:00:39 -0800
Subject: [Patches] [ python-Patches-500002 ] Fix for  #221791 (bad \x escape)
Message-ID: <E16gw11-00065Z-00@usw-sf-web2.sourceforge.net>

Patches item #500002, was opened at 2002-01-05 19:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for  #221791 (bad \x escape)

Initial Comment:
This patch adds file and line output if a bad \x escape
was found in the source. It does so with the following
modifications:

- PyErr_Display now recognizes syntax errors not by
their class, but by an attribute print_file_and_line
- this attribute is set for all SyntaxError instances
- PyErr_SyntaxLocation is enhanced to set all attributes
expected for a syntax error, even if the current
exception has a different class.
- compile.c now invokes PyErr_SyntaxLocation for all
non-syntax exceptions also, mostly through com_error.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 18:00

Message:
Logged In: YES 
user_id=6380

If the pydebug problem can be fixed, I'd be all for
implementing it, and adding to 2.2.1.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-30 10:01

Message:
Logged In: YES 
user_id=6656

This doesn't compile --with-pydebug (he suddenly notices).

There's an assert(val == NULL) in compile.c, but no variable
val.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470


From noreply@sourceforge.net  Fri Mar  1 23:12:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 01 Mar 2002 15:12:30 -0800
Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection
Message-ID: <E16gwCU-0006FW-00@usw-sf-web2.sourceforge.net>

Patches item #515003, was opened at 2002-02-08 16:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mihai Ibanescu (misa)
Assigned to: Nobody/Anonymous (nobody)
Summary: Added HTTP{,S}ProxyConnection

Initial Comment:
This patch adds HTTP*Connection classes for proxy
connections. Authenticated proxies are also supported.

One can argue urllib2 already implements this. It does
not do HTTPS tunneling through proxies, and this is
intended to be lower-level than urllib2.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 18:12

Message:
Logged In: YES 
user_id=6380

OK, thanks; I'll wait!

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2002-03-01 17:58

Message:
Logged In: YES 
user_id=205865

I will add documentation and show the intended usage.
urllib* doesn't deal with proxying over SSL (using CONNECT
instead of GET/POST). urllib* also use the compatibility
classes, HTTP/HTTPS, instead of HTTPConnection (this is not
an argument by itself).
Thanks for the suggestion.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:40

Message:
Logged In: YES 
user_id=6380

This patch fails to seduce me. There's no explanation why
this would be useful, or how it should be used, and no
documentation, and a hint that urllib2 already does this.

Maybe you can get someone who's known on python-dev to
champion it, if you think it's useful?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470


From noreply@sourceforge.net  Sat Mar  2 14:34:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 06:34:26 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16hAag-0007xL-00@usw-sf-web2.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 15:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
Assigned to: Guido van Rossum (gvanrossum)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

>Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-02 15:34

Message:
Logged In: YES 
user_id=466784

PortableUnixMailbox is not that useful, because it only
matches '^From '.  From-quoting is an even bigger mess then
From-headerlines, so that does not really help.

I submit a new diff that matches '\n\nFrom ' or
'<start-of-file>From ', which makes PortableUnixMailbox
useful for my purposes.  It is not that intrusive as the
comment in the mailbox.py suggests.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 22:42

Message:
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 12:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Sat Mar  2 14:38:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 06:38:02 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16hAeA-0005LB-00@usw-sf-web3.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 15:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
Assigned to: Guido van Rossum (gvanrossum)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-02 15:34

Message:
Logged In: YES 
user_id=466784

PortableUnixMailbox is not that useful, because it only
matches '^From '.  From-quoting is an even bigger mess then
From-headerlines, so that does not really help.

I submit a new diff that matches '\n\nFrom ' or
'<start-of-file>From ', which makes PortableUnixMailbox
useful for my purposes.  It is not that intrusive as the
comment in the mailbox.py suggests.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 22:42

Message:
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 12:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Sat Mar  2 14:38:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 06:38:42 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16hAeo-0005Ld-00@usw-sf-web3.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 15:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
Assigned to: Guido van Rossum (gvanrossum)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-02 15:34

Message:
Logged In: YES 
user_id=466784

PortableUnixMailbox is not that useful, because it only
matches '^From '.  From-quoting is an even bigger mess then
From-headerlines, so that does not really help.

I submit a new diff that matches '\n\nFrom ' or
'<start-of-file>From ', which makes PortableUnixMailbox
useful for my purposes.  It is not that intrusive as the
comment in the mailbox.py suggests.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 22:42

Message:
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 12:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Sat Mar  2 16:47:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 08:47:33 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16hCfV-00051Z-00@usw-sf-web1.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 09:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Open
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
>Assigned to: Barry Warsaw (bwarsaw)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-02 11:47

Message:
Logged In: YES 
user_id=12800

Re-opening and assigning to myself.  I'll take a look at
your patches asap.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-02 09:34

Message:
Logged In: YES 
user_id=466784

PortableUnixMailbox is not that useful, because it only
matches '^From '.  From-quoting is an even bigger mess then
From-headerlines, so that does not really help.

I submit a new diff that matches '\n\nFrom ' or
'<start-of-file>From ', which makes PortableUnixMailbox
useful for my purposes.  It is not that intrusive as the
comment in the mailbox.py suggests.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 16:42

Message:
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 06:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 17:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Sat Mar  2 20:24:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 12:24:57 -0800
Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements
Message-ID: <E16hG3t-0001AW-00@usw-sf-web3.sourceforge.net>

Patches item #520694, was opened at 2002-02-20 22:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470

Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 3
Submitted By: Jason Orendorff (jorend)
Assigned to: Martin v. Löwis (loewis)
Summary: arraymodule.c improvements

Initial Comment:
This patch makes brings the array module a little
more up-to-date.

There are two changes:

1. Modernize the array type, memory management,
   and so forth.  As a result, the array()
   builtin is no longer a function but a type.
   array.array is array.ArrayType.
   Also, it can now be subclassed in Python.

2. Add a new typecode 'u', for Unicode
   characters.

The patch includes changes to test/test_array.py
to test the new features.

I would like to make a further change: add an
arrayobject.h include file, and provide some
array operations there, giving them names like
PyArray_Check(), PyArray_GetItem(), and
PyArray_GET_DATA().  Is such a change likely
to find favor?


----------------------------------------------------------------------

>Comment By: Jason Orendorff (jorend)
Date: 2002-03-02 20:24

Message:
Logged In: YES 
user_id=18139

Removing array's tp_print sounds good to me.

(I did not notice this behavior because on Windows,
  type(sys.stdout) is not file
so array_print wasn't being invoked.)


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 10:30

Message:
Logged In: YES 
user_id=21627

Thanks again for the patches; committed as
libarray.tex 1.32
test_array.py 1.14
NEWS 1.358
arraymodule.c 2.67

I added Py_USING_UNICODE before checking this in.

There is one open issue: printing Unicode arrays on the
interpreter prompt will still repr arrays as lists of
Unicode objects; this is because arrays implement tp_print?
Is that necessary? My proposal: just remove the tp_print
implementation.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 07:25

Message:
Logged In: YES 
user_id=18139

Documentation patch.  Please check my TEX; I'm not used to 
it yet, and I can't get the Python docs to build on my 
Windows box, probably because one of the tools isn't 
installed properly, or something.  So there's no way for me 
to check that it's correct, yet.

(...If you let this sit for a moment I'll eventually check 
this for myself on the Linux box, but it'll be a little 
while.  Thanks.)

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 07:21

Message:
Logged In: YES 
user_id=18139

Guido:  In hindsight, yes it would have been much
easier.

...This version adds __iadd__ and __imul__.
There's also a separate documentation patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 22:46

Message:
Logged In: YES 
user_id=6380

Cool. I wonder if it wouldn't have been easier to first
submit and commit the easy changes, and then the unicode
addition separately?

Anyway, I presume that Martin will commit this when it's
ready.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-27 03:15

Message:
Logged In: YES 
user_id=18139

Getting there.  This version has tounicode() and
fromunicode(), and a better repr() for type 'u' arrays.
Also, array.typecode and array.itemsize are now listed
under tp_getset; they're attribute descriptors and
they show up in help(array).  (Neat!)

Next, documentation; then __iadd__ and __imul__.
But not tonight.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-25 12:24

Message:
Logged In: YES 
user_id=21627

Removal of __members__ is fine, then - but you do need to
fill out an appropriate tp_members instead, listing
"typecode" and "itemsize".

Adding __iadd__ and __imul__ is fine; the equivalent feature
for lists has not caused complaints, either, and anybody
using *= on an array probably would consider it a bug that
it isn't in-place.

Please add documentation changes as well; I currently have
Doc/lib/libarray.tex
 \lineiii{'d'}{double}{8}
+\lineiii{'u'}{Py_UNICODE}{2}
 \end{tableiii}

Misc/NEWS
- array.array is now a type object. A new format character
'u' indicates Py_UNICODE arrays.


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-25 00:29

Message:
Logged In: YES 
user_id=18139

Martin writes:  "There is a flaw in the extension of
arrays to Unicode: There is no easy way to get back
the Unicode string."

Boy, are you right.  There should be
array.tounicode() and array.fromunicode()
methods that only work on type 'u' arrays.

...I also want to fix repr for type 'u' arrays.
Instead of "array.array('u', [u'x', u'y', u'z'])" it should
say "array.array('u', u'xyz')".

...I would also implement __iadd__ and __imul__
(as list implements them), but this would be a
semantic change!  Thoughts?

Count on a new patch tomorrow.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-24 21:38

Message:
Logged In: YES 
user_id=31435

Without looking at any details, __members__ and __methods__ 
are deprecated starting with 2.2; the type/class 
unification PEPs aim at moving the universe toward 
supporting and using the class-like introspection API 
instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-24 15:56

Message:
Logged In: YES 
user_id=21627

There is a flaw in the extension of arrays to Unicode: There
is no easy way to get back the Unicode string. You have to use

u"".join(arr.tolist())

This is slightly annoying, since there is it is the only
case where it is not possible to get back the original
constructor arguments.

Also, what is the rationale for removing __members__?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-22 13:39

Message:
Logged In: YES 
user_id=38388

How about simplifying the whole setup altogether and 
add arrays as standard Python types (ie. put the code
in Objects/ and add the new include file to Includes/).

About the inter-module C API export: I'll write up a PEP
about this which will hopefully result in a new standard
support mechanism for this in Python. (BTW, the
approach I used in _ssl/_socket does use PyCObjects)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-22 13:25

Message:
Logged In: YES 
user_id=21627

With the rationale given, I'm now in favour of all parts of
the patch.

As for exposing the API, you need to address MAL's concerns:
PyArray_* won't be available to other extension modules,
instead, you need to do expose them through a C object.

However, I recommend *not* to follow the approach taken in
socket/ssl; I agree with Tim's concerns here. Instead, the
approach taken by cStringIO (via cStringIO.cStringIO_API) is
much better (i.e. put the burden of using the API onto any
importer, and out of Python proper).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-21 08:40

Message:
Logged In: YES 
user_id=38388

About the Unicode bit: if "u" maps to Py_UNICODE I for one 
don't have any objections. The internal encoding is
available in lots of places, so that argument doesn't
count and I'm sure it can be put to some good use
for fast manipulation of large Unicode strings.

I very much like the new exposure of the type at C level;
however I don't understand how you would use it without
adding the complete module to the libpythonx.x.a (unless
you add some sort of inter-module C API import mechanism
like the one I added to _socket and _ssl) ?!


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 02:03

Message:
Logged In: YES 
user_id=18139

> What is the rationale for expanding PyObject_VAR_HEAD?
> It doesn't seem to achieve anything.

It didn't make sense for array to be a VAR_HEAD type.

VAR_HEAD types are variable-size: the last member
defined in the struct for such a type is an array of
length 1, and type->item_size is nonzero.  See
e.g. PyType_GenericAlloc(), and how it decides whether
to call PyObject_INIT or PyObject_VAR_INIT: It checks
type->item_size.

The new arraymodule.c calls PyType_GenericAlloc; the
old one didn't.  So a change seemed warranted.  Since
Arraytype has item_size == 0, it seemed most consistent
to make it a non-VAR type and initialize the ob_size
field myself.

I'm pretty sure I got the right interpretation of this;
but if not, someone wiser in the ways of Python will
speak up.  :)

(While I was looking at this, I noticed this:
http://sourceforge.net/tracker/index.php?
func=detail&aid=520768&group_id=5470&atid=305470)


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 01:15

Message:
Logged In: YES 
user_id=18139

> I don't like the Unicode part of it at all.

Well, I'm not attatched to it.  It's very easy
to subtract it from the patch.

> What can you do with this feature?

The same sort of thing you might do with an array
of type 'c'.  For example, change individual
characters of a (Unicode) string and then run a
(Unicode) re.match on it.

> It seems to unfairly prefer a specific Unicode encoding,
> without explaining what that encoding is, and without a
> clear use case why this encoding is desirable.

Well, why should array('h', '\x00\xff\xaa\xbb')
be allowed?  Why is that encoding preferable to any
other particular encoding of short ints?  Easy:
it's the encoding of the C compiler where Python was
built.  For 'u' arrays, the encoding used is just the
encoding that Python uses internally.

However, it's not intended to be used in any situation
where encode()/decode() would be appropriate.  I never
even thought about that possibility when I wrote it.

The behavior of a 'u' array is intended to be more
like this:  Suppose A = array('u', ustr).  Then:
    len(A) == len(ustr)
    A[0] == ustr[0]
    A[1] == ustr[1]
    ...

That is, a 'u' array is an array of Unicode characters.
Encoding is not an issue, any more than with the
built-in unicode type.

(If ustr is a non-Unicode string, then the behavior
is different -- more in line with what 'b', 'h', 'i',
and the others do.)

If your concern is that Python currently "hides" its
internal encoding, and the 'u' array exposes this
unnecessarily, then consider these two examples that
don't involve arrays:

>>> x = u'\U00012345'  # One Unicode codepoint...
>>> len(x)
2             # hmm.
>>> x[0]
u'\ud808'     # aha.  UTF-16.
>>> x[1]
u'\udf45'

>>> str(buffer(u'abc'))   # Example two.
'a\x00b\x00c\x00'

> It also seems to overlap with the Unicode object's
> .encode method, which is much more general.

Wow.  Well, that wasn't my intent.

It is intended, rather, to offer parity with 'c'.
Java has byte[], short[], int[], long[], float[],
double[], and char[]... Python doesn't currently have
char[].  Shouldn't it?


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-20 23:02

Message:
Logged In: YES 
user_id=21627

What is the rationale for expanding PyObject_VAR_HEAD? It
doesn't seem to achieve anything.

I don't like the Unicode part of it at all. What can you do
with this feature? It seems to unfairly prefer a specific
Unicode encoding, without explaining what that encoding is,
and without a clear use case why this encoding is desirable.
It also seems to overlap with the Unicode object's .encode
method, which is much more general.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 03:19:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 19:19:45 -0800
Subject: [Patches] [ python-Patches-450267 ] OS/2+EMX port - changes to Python core
Message-ID: <E16hMXJ-0000MM-00@usw-sf-web2.sourceforge.net>

Patches item #450267, was opened at 2001-08-12 21:34
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450267&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
>Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: OS/2+EMX port - changes to Python core

Initial Comment:
The attached patch incorporates the changes to the source tree between Python 2.1.1 and the 010812 release 
of the OS/2+EMX port.

It includes changes to files in Include/, Modules/, Objects/ and Python/.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:19

Message:
Logged In: YES 
user_id=250749

All parts now committed.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-02-17 16:16

Message:
Logged In: YES 
user_id=250749

Following discussion on python-dev, I have created patches for Objects/stringobject.c and Objects/unicodeobject.c that aim 
to rationalise the %#x/%#X format conversion mess.

These two patches remove approaches specific to the various bugs and standard violations encountered with these format 
conversions, and take the approach of relying on the behaviour of the %x/%X format conversions and directly supplying 
Python's preferred prefix (0x/0X respectively).

The patches presented are against CVS of 15Feb02 1430 AEST, and have been tested on both OS/2 and FreeBSD.

If acceptable, I would prefer to apply my pre-existing patches for these files (the Objects patch below) before these new 
patches, as my earlier patches with their EMX specifics in OS/2 specific #ifdefs are "failsafe" as far as other platforms are 
concerned.  Then if the new approach causes other platforms to fail, these patches can be backed out without breaking 
the OS/2 port.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-01-27 17:19

Message:
Logged In: YES 
user_id=250749

I have split the original approach into patches for each of the Include, Modules, Objects and Python directories.

Of particular note:
- the patches to import.c are general to both VACPP and EMX ports, and have been trialled by Michael Muller with 
satisfactory results.
- Modules/unicodedata.c has a name clash between its internally defined _getname() and an EMX routine of the same 
name defined in <stdlib.h>.  Is the solution in the patch acceptable?
- both Objects/stringobject.c and Objects/unicodeobject.c have changes to deal with EMX's runtime not producing a desired 
"0X" prefix in response to a "%X" format specifier (it produces "0x" instead).

The patched source tree has been built and regression tested on both EMX and FreeBSD 4.4, with no unexpected results.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-10-02 08:59

Message:
Logged In: YES 
user_id=21627

Please review this patch carefully again, asking in each 
case whether this chunk *really* belongs to this patch. Do 
so by asking "is it specific to the port of Python to 
os2emx?" There are some changes that are desirable, but 
are unrelated (like the whitespace changes in 
PyThread_down_sema). Please submit those in a separate 
patch. There are also changes that don't belong here at 
all, like the inclusion of a Modules/Setup.

If you are revising this patch, you may also split it into 
a part that is absolutely necessary, and a part that is 
nice-to-have. E.g. the termios changes are probably 
system-specific, but I guess the port would work well 
without them.

Without going in small steps, it seems that we won't move 
at all. You may consider making use of your checkin 
permissions for uncritical operations. If you need help in 
CVS operations, please let me know.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2001-08-13 23:21

Message:
Logged In: YES 
user_id=250749

Thanks for the feedback.  At this stage of the game, I'd prefer to work with a "supervisor" rather than take on CVS commit 
privs, though I realise that "supervisors" are a scarce resource.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-08-13 00:53

Message:
Logged In: YES 
user_id=6380

Hi Andrew,

Thanks for the patches.

There's a lot of code (here and in the two previous
patches). I'm going to see if we can give you CVS commit
permission so you can apply the changes yourself.

Note that commit permission (if you get it) doesn't mean
that the patch is automatically approved -- I've seen some
changes in your diffs that look questionable. You probably
know which ones. :-)

In general, the guidelines are that you can make changes
freely (a) in code you own because it's in a file or
directory that's specific to your port; (b) in code specific
to your port that's inside #ifdefs for your port (this
includes adding); (c) to fix an *obvious* small typo or
buglet that bothers your compiler (example: if your compiler
warns about an unused variable, feel free to delete it, as
long as the unusedness isn't dependent on an #ifdef).

For other changes we all appreciate it if you discuss them
on python-dev or on the SF patch manager first.

Oh, and if you ever check something in that breaks the build
on another platform or causes the test suite to fail, people
will demand very quick remedies or the checkin will be
reversed. :-)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450267&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 03:21:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 19:21:54 -0800
Subject: [Patches] [ python-Patches-514490 ] Better pager selection for OS/2
Message-ID: <E16hMZO-0000NQ-00@usw-sf-web2.sourceforge.net>

Patches item #514490, was opened at 2002-02-08 07:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514490&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Stefan Schwarzer (sschwarzer)
Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: Better pager selection for OS/2

Initial Comment:
With the current implementation (rev. 1.56) of pydoc.py the first call of the help command gives (when 
the pager environmment variable is not set):

Python 2.2 (#0, Dec 24 2001, 18:42:48) [EMX GCC 2.8.1] on os2emx
Type "help", "copyright", "credits" or "license" for more information.
>>> help(help)
SYS0003: The system cannot find the path specified.

Help on instance of _Helper:

Type help() for interactive help, or help(object) for help about object.

>>>

After the error message one has to press Ctrl-C. Further invocations of help work, though.

The attached patch selects 'more <' as the default pager when no PAGER env. variable is set, like on 
Windows. I use os.platform.startswith to deal with a possible future port with os.platform == 'os2vac'.


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:21

Message:
Logged In: YES 
user_id=250749

Committed.  Thanks for the patch!

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-02-13 23:40

Message:
Logged In: YES 
user_id=250749

The patch looks Ok to me.  I plan to apply it after I have all the EMX port patches into CVS.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514490&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 03:32:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 19:32:30 -0800
Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen()
Message-ID: <E16hMje-0000UI-00@usw-sf-web2.sourceforge.net>

Patches item #523415, was opened at 2002-02-28 01:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Gimblett (gimbo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Explict proxies for urllib.urlopen()

Initial Comment:
This patch extends urllib.urlopen() so that
proxies may be specified explicitly.  This is
achieved by adding an optional "proxies"
parameter.  If this parameter is omitted,
urlopen() acts exactly as before, ie gets
proxy settings from the environment.

This is useful if you want to tell urlopen()
not to use the proxy: just pass an empty
dictionary.

Also included is a patch to the urllib
documentation explaining the new parameter.

Apologies if patch format is not exactly as
required: this is my first submission.  All
feedback appreciated.  :-)


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:32

Message:
Logged In: YES 
user_id=250749

Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the 
docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) 
directly.  If this is the intent, the docs could be a little clearer about this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 03:34:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 02 Mar 2002 19:34:49 -0800
Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen()
Message-ID: <E16hMlt-0000Vg-00@usw-sf-web2.sourceforge.net>

Patches item #523415, was opened at 2002-02-28 01:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Gimblett (gimbo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Explict proxies for urllib.urlopen()

Initial Comment:
This patch extends urllib.urlopen() so that
proxies may be specified explicitly.  This is
achieved by adding an optional "proxies"
parameter.  If this parameter is omitted,
urlopen() acts exactly as before, ie gets
proxy settings from the environment.

This is useful if you want to tell urlopen()
not to use the proxy: just pass an empty
dictionary.

Also included is a patch to the urllib
documentation explaining the new parameter.

Apologies if patch format is not exactly as
required: this is my first submission.  All
feedback appreciated.  :-)


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:34

Message:
Logged In: YES 
user_id=250749

BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:32

Message:
Logged In: YES 
user_id=250749

Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the 
docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) 
directly.  If this is the intent, the docs could be a little clearer about this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 11:58:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 03:58:48 -0800
Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes
Message-ID: <E16hUdc-0002md-00@usw-sf-web3.sourceforge.net>

Patches item #525109, was opened at 2002-03-03 11:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470

Category: IDLE
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Liebmann (mliebmann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extension to Calltips / Show attributes

Initial Comment:
The attached files (unified diff files) implement a 
(quick and dirty but usefull) extension to IDLE 0.8 
(Python 2.2)

- Tested on WINDOWS 95/98/NT/2000 -

Similar to "CallTips" this extension shows (context 
sensitive) all available member functions and 
attributes of the current object after hitting 
the 'dot'-key.

The toplevel help widget now supports scrolling. (Key-
Up and Key-Down events)

...that is why I changed among else the first argument 
of 'showtip' from 'text string' to a 'list of text 
strings' ...

The 'space'-key is used to insert the topmost item of 
the help widget into an IDLE text window.

...the even handling seems to be a critical part of 
the current IDLE implementation. That is why I added 
the new functionallity as a patch of CallTips.py and 
CallTipWindow.py. May be you still have a better 
implementation ...

Greetings
Martin Liebmann

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 18:29:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 10:29:41 -0800
Subject: [Patches] [ python-Patches-500002 ] Fix for  #221791 (bad \x escape)
Message-ID: <E16hajt-0002dN-00@usw-sf-web2.sourceforge.net>

Patches item #500002, was opened at 2002-01-06 00:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
>Assigned to: Martin v. Löwis (loewis)
Summary: Fix for  #221791 (bad \x escape)

Initial Comment:
This patch adds file and line output if a bad \x escape
was found in the source. It does so with the following
modifications:

- PyErr_Display now recognizes syntax errors not by
their class, but by an attribute print_file_and_line
- this attribute is set for all SyntaxError instances
- PyErr_SyntaxLocation is enhanced to set all attributes
expected for a syntax error, even if the current
exception has a different class.
- compile.c now invokes PyErr_SyntaxLocation for all
non-syntax exceptions also, mostly through com_error.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 23:00

Message:
Logged In: YES 
user_id=6380

If the pydebug problem can be fixed, I'd be all for
implementing it, and adding to 2.2.1.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-30 15:01

Message:
Logged In: YES 
user_id=6656

This doesn't compile --with-pydebug (he suddenly notices).

There's an assert(val == NULL) in compile.c, but no variable
val.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 19:50:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 11:50:37 -0800
Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used
Message-ID: <E16hc0D-0003aJ-00@usw-sf-web2.sourceforge.net>

Patches item #525211, was opened at 2002-03-03 12:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Evelyn Mitchell (efm)
Assigned to: Nobody/Anonymous (nobody)
Summary: Utils.py imported module not used

Initial Comment:
pychecker complains of 

Utils.py:1: Imported module (re) not used

in email/Utils.py


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 20:47:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 12:47:10 -0800
Subject: [Patches] [ python-Patches-525225 ] email Generator.py unused import
Message-ID: <E16hcsw-0000v5-00@usw-sf-web1.sourceforge.net>

Patches item #525225, was opened at 2002-03-03 13:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Evelyn Mitchell (efm)
Assigned to: Nobody/Anonymous (nobody)
Summary: email Generator.py unused import

Initial Comment:
pychecker complains:

Generator.py:15: Imported module (Message) not used
Generator.py:16: Imported module (Errors) not used


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 21:33:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 13:33:59 -0800
Subject: [Patches] [ python-Patches-500002 ] Fix for  #221791 (bad \x escape)
Message-ID: <E16hdcF-00024O-00@usw-sf-web3.sourceforge.net>

Patches item #500002, was opened at 2002-01-06 01:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Martin v. Löwis (loewis)
Summary: Fix for  #221791 (bad \x escape)

Initial Comment:
This patch adds file and line output if a bad \x escape
was found in the source. It does so with the following
modifications:

- PyErr_Display now recognizes syntax errors not by
their class, but by an attribute print_file_and_line
- this attribute is set for all SyntaxError instances
- PyErr_SyntaxLocation is enhanced to set all attributes
expected for a syntax error, even if the current
exception has a different class.
- compile.c now invokes PyErr_SyntaxLocation for all
non-syntax exceptions also, mostly through com_error.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-03 22:33

Message:
Logged In: YES 
user_id=21627

Both asserts in this place where non-sensical left-overs
from an earlier version, and are now removed.

Committed as

NEWS 1.360          1.337.2.4.2.2
compile.c 2.239     2.234.4.3
errors.c 2.67       2.66.10.1
exceptions.c 1.29   1.28.6.1
pythonrun.c 2.155   2.153.6.2


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-02 00:00

Message:
Logged In: YES 
user_id=6380

If the pydebug problem can be fixed, I'd be all for
implementing it, and adding to 2.2.1.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-30 16:01

Message:
Logged In: YES 
user_id=6656

This doesn't compile --with-pydebug (he suddenly notices).

There's an assert(val == NULL) in compile.c, but no variable
val.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 21:36:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 13:36:57 -0800
Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used
Message-ID: <E16hdf7-00026X-00@usw-sf-web3.sourceforge.net>

Patches item #525211, was opened at 2002-03-03 20:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Evelyn Mitchell (efm)
Assigned to: Nobody/Anonymous (nobody)
Summary: Utils.py imported module not used

Initial Comment:
pychecker complains of 

Utils.py:1: Imported module (re) not used

in email/Utils.py


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-03 22:36

Message:
Logged In: YES 
user_id=21627

It is used, in the line

ecre = re.compile(r'''
[...]
  ''', re.VERBOSE | re.IGNORECASE)

pychecker bug?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 21:39:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 13:39:27 -0800
Subject: [Patches] [ python-Patches-525225 ] email Generator.py unused import
Message-ID: <E16hdhX-0001ZS-00@usw-sf-web1.sourceforge.net>

Patches item #525225, was opened at 2002-03-03 21:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Evelyn Mitchell (efm)
>Assigned to: Barry Warsaw (bwarsaw)
Summary: email Generator.py unused import

Initial Comment:
pychecker complains:

Generator.py:15: Imported module (Message) not used
Generator.py:16: Imported module (Errors) not used


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-03 22:39

Message:
Logged In: YES 
user_id=21627

Barry, those are indeed unused. Ok to remove them?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 22:02:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 14:02:09 -0800
Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes
Message-ID: <E16he3V-0002R2-00@usw-sf-web3.sourceforge.net>

Patches item #525109, was opened at 2002-03-03 11:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470

Category: IDLE
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Liebmann (mliebmann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extension to Calltips / Show attributes

Initial Comment:
The attached files (unified diff files) implement a 
(quick and dirty but usefull) extension to IDLE 0.8 
(Python 2.2)

- Tested on WINDOWS 95/98/NT/2000 -

Similar to "CallTips" this extension shows (context 
sensitive) all available member functions and 
attributes of the current object after hitting 
the 'dot'-key.

The toplevel help widget now supports scrolling. (Key-
Up and Key-Down events)

...that is why I changed among else the first argument 
of 'showtip' from 'text string' to a 'list of text 
strings' ...

The 'space'-key is used to insert the topmost item of 
the help widget into an IDLE text window.

...the even handling seems to be a critical part of 
the current IDLE implementation. That is why I added 
the new functionallity as a patch of CallTips.py and 
CallTipWindow.py. May be you still have a better 
implementation ...

Greetings
Martin Liebmann

----------------------------------------------------------------------

>Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-03 22:02

Message:
Logged In: YES 
user_id=475133

'<Key-.>' must be substituted by '.' within CallTip.py !
( Linux do not support an event named <Key-.> )

Running idle on Linux, I found the warning, that 'import *' 
is not allowed within function '_dir_main' of CallTip.py ???
Nevertheless CallTips works fine on Linux

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 22:04:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 14:04:56 -0800
Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used
Message-ID: <E16he6C-0002T3-00@usw-sf-web3.sourceforge.net>

Patches item #525211, was opened at 2002-03-03 12:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Evelyn Mitchell (efm)
Assigned to: Nobody/Anonymous (nobody)
Summary: Utils.py imported module not used

Initial Comment:
pychecker complains of 

Utils.py:1: Imported module (re) not used

in email/Utils.py


----------------------------------------------------------------------

>Comment By: Evelyn Mitchell (efm)
Date: 2002-03-03 15:04

Message:
Logged In: YES 
user_id=13263

Yeah, it's probably a pychecker bug. I'll submit it there.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-03 14:36

Message:
Logged In: YES 
user_id=21627

It is used, in the line

ecre = re.compile(r'''
[...]
  ''', re.VERBOSE | re.IGNORECASE)

pychecker bug?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470


From noreply@sourceforge.net  Sun Mar  3 22:46:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 14:46:11 -0800
Subject: [Patches] [ python-Patches-525225 ] email Generator.py unused import
Message-ID: <E16hek7-0002z5-00@usw-sf-web3.sourceforge.net>

Patches item #525225, was opened at 2002-03-03 15:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Evelyn Mitchell (efm)
Assigned to: Barry Warsaw (bwarsaw)
Summary: email Generator.py unused import

Initial Comment:
pychecker complains:

Generator.py:15: Imported module (Message) not used
Generator.py:16: Imported module (Errors) not used


----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-03 17:46

Message:
Logged In: YES 
user_id=12800

Accepted.  I actually fixed this in email v1.1 (standalone),
which has not yet been integrated into the Python trunk

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-03 16:39

Message:
Logged In: YES 
user_id=21627

Barry, those are indeed unused. Ok to remove them?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470


From noreply@sourceforge.net  Mon Mar  4 05:47:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 03 Mar 2002 21:47:55 -0800
Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL
Message-ID: <E16hlKF-0007QK-00@usw-sf-web1.sourceforge.net>

Patches item #524327, was opened at 2002-03-02 00:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Tino Lange (tinolange)
Assigned to: Piers Lauder (pierslauder)
Summary: imaplib.py and SSL

Initial Comment:
Hallo!

Our company has decided to allow only SSL connections 
to the e-mailbox from outside. So I needed a SSL 
capable "imaplib.py" to run my mailwatcher-scripts 
from home.

Thanks to the socket.ssl() in recent Pythons it was 
nearly no problem to derive an IMAP4_SSL-class from 
the existing IMAP4-class in Python's standard library.

Maybe you want to look over the very small additions 
that were necessary to implement the IMAP-over-SSL-
functionality and add it as a part of the next 
official "imaplib.py"?

Here's the context diff from the most recent CVS 
version (1.43). It works fine for me this way and it's 
only a few straight-forward lines of code.

Maybe I could contribute a bit to the Python project 
with this patch?

Best regards

Tino Lange


----------------------------------------------------------------------

>Comment By: Piers Lauder (pierslauder)
Date: 2002-03-04 16:47

Message:
Logged In: YES 
user_id=196212

This seems fine to me, but i can't test it as i don't have
access to an ssl-enabled imapd. My only caveat is - do
socket.ssl objects have a "sendall" method? - in which case
that is what should be used in the send method.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470


From noreply@sourceforge.net  Mon Mar  4 09:01:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 01:01:54 -0800
Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used
Message-ID: <E16hoLy-0001wD-00@usw-sf-web3.sourceforge.net>

Patches item #525211, was opened at 2002-03-03 20:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Evelyn Mitchell (efm)
Assigned to: Nobody/Anonymous (nobody)
Summary: Utils.py imported module not used

Initial Comment:
pychecker complains of 

Utils.py:1: Imported module (re) not used

in email/Utils.py


----------------------------------------------------------------------

Comment By: Evelyn Mitchell (efm)
Date: 2002-03-03 23:04

Message:
Logged In: YES 
user_id=13263

Yeah, it's probably a pychecker bug. I'll submit it there.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-03 22:36

Message:
Logged In: YES 
user_id=21627

It is used, in the line

ecre = re.compile(r'''
[...]
  ''', re.VERBOSE | re.IGNORECASE)

pychecker bug?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470


From noreply@sourceforge.net  Mon Mar  4 09:33:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 01:33:17 -0800
Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen()
Message-ID: <E16hoqL-0002Lr-00@usw-sf-web3.sourceforge.net>

Patches item #523415, was opened at 2002-02-27 14:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Gimblett (gimbo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Explict proxies for urllib.urlopen()

Initial Comment:
This patch extends urllib.urlopen() so that
proxies may be specified explicitly.  This is
achieved by adding an optional "proxies"
parameter.  If this parameter is omitted,
urlopen() acts exactly as before, ie gets
proxy settings from the environment.

This is useful if you want to tell urlopen()
not to use the proxy: just pass an empty
dictionary.

Also included is a patch to the urllib
documentation explaining the new parameter.

Apologies if patch format is not exactly as
required: this is my first submission.  All
feedback appreciated.  :-)


----------------------------------------------------------------------

>Comment By: Andy Gimblett (gimbo)
Date: 2002-03-04 09:33

Message:
Logged In: YES 
user_id=262849

Thanks for feedback re: diffs.  Have now found out
about context diffs and attached new version - hope
this is better.

Regarding the patch itself, this arose out of a newbie
question on c.l.py and I was reminded that this was an
issue I'd come across in my early days too.  Personally
I'd never picked up the hint that you should use
FancyURLopener directly.

If preferred, I could have a go at patching the docs
to make that clearer?


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 03:34

Message:
Logged In: YES 
user_id=250749

BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 03:32

Message:
Logged In: YES 
user_id=250749

Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the 
docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) 
directly.  If this is the intent, the docs could be a little clearer about this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470


From noreply@sourceforge.net  Mon Mar  4 09:41:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 01:41:31 -0800
Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements
Message-ID: <E16hoyJ-0002T0-00@usw-sf-web3.sourceforge.net>

Patches item #520694, was opened at 2002-02-20 23:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470

Category: None
Group: None
>Status: Closed
Resolution: Accepted
Priority: 3
Submitted By: Jason Orendorff (jorend)
Assigned to: Martin v. Löwis (loewis)
Summary: arraymodule.c improvements

Initial Comment:
This patch makes brings the array module a little
more up-to-date.

There are two changes:

1. Modernize the array type, memory management,
   and so forth.  As a result, the array()
   builtin is no longer a function but a type.
   array.array is array.ArrayType.
   Also, it can now be subclassed in Python.

2. Add a new typecode 'u', for Unicode
   characters.

The patch includes changes to test/test_array.py
to test the new features.

I would like to make a further change: add an
arrayobject.h include file, and provide some
array operations there, giving them names like
PyArray_Check(), PyArray_GetItem(), and
PyArray_GET_DATA().  Is such a change likely
to find favor?


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-04 10:41

Message:
Logged In: YES 
user_id=21627

Deleted tp_print, closing this patch.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-02 21:24

Message:
Logged In: YES 
user_id=18139

Removing array's tp_print sounds good to me.

(I did not notice this behavior because on Windows,
  type(sys.stdout) is not file
so array_print wasn't being invoked.)


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-01 11:30

Message:
Logged In: YES 
user_id=21627

Thanks again for the patches; committed as
libarray.tex 1.32
test_array.py 1.14
NEWS 1.358
arraymodule.c 2.67

I added Py_USING_UNICODE before checking this in.

There is one open issue: printing Unicode arrays on the
interpreter prompt will still repr arrays as lists of
Unicode objects; this is because arrays implement tp_print?
Is that necessary? My proposal: just remove the tp_print
implementation.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 08:25

Message:
Logged In: YES 
user_id=18139

Documentation patch.  Please check my TEX; I'm not used to 
it yet, and I can't get the Python docs to build on my 
Windows box, probably because one of the tools isn't 
installed properly, or something.  So there's no way for me 
to check that it's correct, yet.

(...If you let this sit for a moment I'll eventually check 
this for myself on the Linux box, but it'll be a little 
while.  Thanks.)

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-03-01 08:21

Message:
Logged In: YES 
user_id=18139

Guido:  In hindsight, yes it would have been much
easier.

...This version adds __iadd__ and __imul__.
There's also a separate documentation patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 23:46

Message:
Logged In: YES 
user_id=6380

Cool. I wonder if it wouldn't have been easier to first
submit and commit the easy changes, and then the unicode
addition separately?

Anyway, I presume that Martin will commit this when it's
ready.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-27 04:15

Message:
Logged In: YES 
user_id=18139

Getting there.  This version has tounicode() and
fromunicode(), and a better repr() for type 'u' arrays.
Also, array.typecode and array.itemsize are now listed
under tp_getset; they're attribute descriptors and
they show up in help(array).  (Neat!)

Next, documentation; then __iadd__ and __imul__.
But not tonight.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-25 13:24

Message:
Logged In: YES 
user_id=21627

Removal of __members__ is fine, then - but you do need to
fill out an appropriate tp_members instead, listing
"typecode" and "itemsize".

Adding __iadd__ and __imul__ is fine; the equivalent feature
for lists has not caused complaints, either, and anybody
using *= on an array probably would consider it a bug that
it isn't in-place.

Please add documentation changes as well; I currently have
Doc/lib/libarray.tex
 \lineiii{'d'}{double}{8}
+\lineiii{'u'}{Py_UNICODE}{2}
 \end{tableiii}

Misc/NEWS
- array.array is now a type object. A new format character
'u' indicates Py_UNICODE arrays.


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-25 01:29

Message:
Logged In: YES 
user_id=18139

Martin writes:  "There is a flaw in the extension of
arrays to Unicode: There is no easy way to get back
the Unicode string."

Boy, are you right.  There should be
array.tounicode() and array.fromunicode()
methods that only work on type 'u' arrays.

...I also want to fix repr for type 'u' arrays.
Instead of "array.array('u', [u'x', u'y', u'z'])" it should
say "array.array('u', u'xyz')".

...I would also implement __iadd__ and __imul__
(as list implements them), but this would be a
semantic change!  Thoughts?

Count on a new patch tomorrow.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-24 22:38

Message:
Logged In: YES 
user_id=31435

Without looking at any details, __members__ and __methods__ 
are deprecated starting with 2.2; the type/class 
unification PEPs aim at moving the universe toward 
supporting and using the class-like introspection API 
instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-24 16:56

Message:
Logged In: YES 
user_id=21627

There is a flaw in the extension of arrays to Unicode: There
is no easy way to get back the Unicode string. You have to use

u"".join(arr.tolist())

This is slightly annoying, since there is it is the only
case where it is not possible to get back the original
constructor arguments.

Also, what is the rationale for removing __members__?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-22 14:39

Message:
Logged In: YES 
user_id=38388

How about simplifying the whole setup altogether and 
add arrays as standard Python types (ie. put the code
in Objects/ and add the new include file to Includes/).

About the inter-module C API export: I'll write up a PEP
about this which will hopefully result in a new standard
support mechanism for this in Python. (BTW, the
approach I used in _ssl/_socket does use PyCObjects)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-22 14:25

Message:
Logged In: YES 
user_id=21627

With the rationale given, I'm now in favour of all parts of
the patch.

As for exposing the API, you need to address MAL's concerns:
PyArray_* won't be available to other extension modules,
instead, you need to do expose them through a C object.

However, I recommend *not* to follow the approach taken in
socket/ssl; I agree with Tim's concerns here. Instead, the
approach taken by cStringIO (via cStringIO.cStringIO_API) is
much better (i.e. put the burden of using the API onto any
importer, and out of Python proper).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-21 09:40

Message:
Logged In: YES 
user_id=38388

About the Unicode bit: if "u" maps to Py_UNICODE I for one 
don't have any objections. The internal encoding is
available in lots of places, so that argument doesn't
count and I'm sure it can be put to some good use
for fast manipulation of large Unicode strings.

I very much like the new exposure of the type at C level;
however I don't understand how you would use it without
adding the complete module to the libpythonx.x.a (unless
you add some sort of inter-module C API import mechanism
like the one I added to _socket and _ssl) ?!


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 03:03

Message:
Logged In: YES 
user_id=18139

> What is the rationale for expanding PyObject_VAR_HEAD?
> It doesn't seem to achieve anything.

It didn't make sense for array to be a VAR_HEAD type.

VAR_HEAD types are variable-size: the last member
defined in the struct for such a type is an array of
length 1, and type->item_size is nonzero.  See
e.g. PyType_GenericAlloc(), and how it decides whether
to call PyObject_INIT or PyObject_VAR_INIT: It checks
type->item_size.

The new arraymodule.c calls PyType_GenericAlloc; the
old one didn't.  So a change seemed warranted.  Since
Arraytype has item_size == 0, it seemed most consistent
to make it a non-VAR type and initialize the ob_size
field myself.

I'm pretty sure I got the right interpretation of this;
but if not, someone wiser in the ways of Python will
speak up.  :)

(While I was looking at this, I noticed this:
http://sourceforge.net/tracker/index.php?
func=detail&aid=520768&group_id=5470&atid=305470)


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-21 02:15

Message:
Logged In: YES 
user_id=18139

> I don't like the Unicode part of it at all.

Well, I'm not attatched to it.  It's very easy
to subtract it from the patch.

> What can you do with this feature?

The same sort of thing you might do with an array
of type 'c'.  For example, change individual
characters of a (Unicode) string and then run a
(Unicode) re.match on it.

> It seems to unfairly prefer a specific Unicode encoding,
> without explaining what that encoding is, and without a
> clear use case why this encoding is desirable.

Well, why should array('h', '\x00\xff\xaa\xbb')
be allowed?  Why is that encoding preferable to any
other particular encoding of short ints?  Easy:
it's the encoding of the C compiler where Python was
built.  For 'u' arrays, the encoding used is just the
encoding that Python uses internally.

However, it's not intended to be used in any situation
where encode()/decode() would be appropriate.  I never
even thought about that possibility when I wrote it.

The behavior of a 'u' array is intended to be more
like this:  Suppose A = array('u', ustr).  Then:
    len(A) == len(ustr)
    A[0] == ustr[0]
    A[1] == ustr[1]
    ...

That is, a 'u' array is an array of Unicode characters.
Encoding is not an issue, any more than with the
built-in unicode type.

(If ustr is a non-Unicode string, then the behavior
is different -- more in line with what 'b', 'h', 'i',
and the others do.)

If your concern is that Python currently "hides" its
internal encoding, and the 'u' array exposes this
unnecessarily, then consider these two examples that
don't involve arrays:

>>> x = u'\U00012345'  # One Unicode codepoint...
>>> len(x)
2             # hmm.
>>> x[0]
u'\ud808'     # aha.  UTF-16.
>>> x[1]
u'\udf45'

>>> str(buffer(u'abc'))   # Example two.
'a\x00b\x00c\x00'

> It also seems to overlap with the Unicode object's
> .encode method, which is much more general.

Wow.  Well, that wasn't my intent.

It is intended, rather, to offer parity with 'c'.
Java has byte[], short[], int[], long[], float[],
double[], and char[]... Python doesn't currently have
char[].  Shouldn't it?


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-21 00:02

Message:
Logged In: YES 
user_id=21627

What is the rationale for expanding PyObject_VAR_HEAD? It
doesn't seem to achieve anything.

I don't like the Unicode part of it at all. What can you do
with this feature? It seems to unfairly prefer a specific
Unicode encoding, without explaining what that encoding is,
and without a clear use case why this encoding is desirable.
It also seems to overlap with the Unicode object's .encode
method, which is much more general.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470


From noreply@sourceforge.net  Mon Mar  4 10:55:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 02:55:18 -0800
Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL
Message-ID: <E16hq7i-00069P-00@usw-sf-web2.sourceforge.net>

Patches item #524327, was opened at 2002-03-01 14:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Tino Lange (tinolange)
Assigned to: Piers Lauder (pierslauder)
Summary: imaplib.py and SSL

Initial Comment:
Hallo!

Our company has decided to allow only SSL connections 
to the e-mailbox from outside. So I needed a SSL 
capable "imaplib.py" to run my mailwatcher-scripts 
from home.

Thanks to the socket.ssl() in recent Pythons it was 
nearly no problem to derive an IMAP4_SSL-class from 
the existing IMAP4-class in Python's standard library.

Maybe you want to look over the very small additions 
that were necessary to implement the IMAP-over-SSL-
functionality and add it as a part of the next 
official "imaplib.py"?

Here's the context diff from the most recent CVS 
version (1.43). It works fine for me this way and it's 
only a few straight-forward lines of code.

Maybe I could contribute a bit to the Python project 
with this patch?

Best regards

Tino Lange


----------------------------------------------------------------------

>Comment By: Tino Lange (tinolange)
Date: 2002-03-04 11:55

Message:
Logged In: YES 
user_id=212920

Hallo!

socket.ssl() -Objects only have _two_ methods
read()
write()

I don't know how they handle write() internally - whether
they use a send() or a sendall() equivalent for the
underlying socket call. I didn't look in the C sources for
that.

That's also why I had to code the readline() by hand in the
while-loop, because socket.ssl() - Objects only have read(),
no readline().

But the implementation works quite fine (by the way also
under Windows after replacing the _socket.pyd with an SSL
enabled one). 

Best regards

Tino


----------------------------------------------------------------------

Comment By: Piers Lauder (pierslauder)
Date: 2002-03-04 06:47

Message:
Logged In: YES 
user_id=196212

This seems fine to me, but i can't test it as i don't have
access to an ssl-enabled imapd. My only caveat is - do
socket.ssl objects have a "sendall" method? - in which case
that is what should be used in the send method.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470


From noreply@sourceforge.net  Mon Mar  4 14:50:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 06:50:34 -0800
Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores
Message-ID: <E16htnO-0006PW-00@usw-sf-web3.sourceforge.net>

Patches item #525532, was opened at 2002-03-04 14:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add support for POSIX semaphores

Initial Comment:
thread_pthread.h can be modified to use POSIX 
semaphores if available. This is more efficient than 
emulating them with mutexes and condition variables, 
and at least one platform that supports POSIX 
semaphores has a race condition in its condition 
variable support.

The new file would still be supporting POSIX threads, 
although from both <pthread.h> and <semaphore.h>, so 
perhaps ought to be renamed if this patch is accepted.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470


From outros@kyky.zzn.com  Mon Mar  4 16:22:51 2002
From: outros@kyky.zzn.com (Bordeaux Buffet)
Date: Mon, 4 Mar 2002 13:22:51 -0300
Subject: [Patches] Não Compre... Alugue!
Message-ID: <E16hvBM-0005Wb-00@mail.python.org>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
	<title>Não Compre... Alugue!</title>
</head>

<body bgcolor="000000">
<hr size="2" width="250" noshade>
<center>
<a href="http://www.bordeaux.com.br"><img
src="http://www.bordeaux.com.br/logop.jpg" border="0">
<br>
<font face="arial" color="#ffffff" size="5">:: Bordeaux Buffet ::</font>
<br>
<font color="#ffffff" size="4">Aluguel de Materiais para festas.</font>
<br>
<br>
<font color="#ffffff">Alugue todo o material para o seu evento!</font>
<br>
<br>
<font color="#ffffff" size="4">Cadeiras - Mesas - Toalhas - Copos - Talheres
- Pratos - Baixelas - Rechaud - Samovar - Estufa e muito mais!!!</font>
<br>
<br>
<font color="#ffffff" size="4">São mais de 1000 itens a sua escolha.
Entregamos em todo o território nacional!</font>
<br>
<br>
<font color="#ffffff" size="4">Fornecemos gelo em cubo, barra e
triturado.</font>
<br>
<br>
<font color="#ffff00" size="3">www.bordeaux.com.br</font>
<br>
<br>
<font color="#0000ff" size="3">Para retirar seu nome de nossa lista de
e-mails, 
retorne este e-mail com o Subject ( Assunto ) = Remover</font>
<br>
<font color="#000fff" size="3">Esta mensagem é enviada de acordo com a nova
legislação sobre correio eletrônico, Seção 301, Parágrafo (a) (2) (c) Decreto
S. 1618, Título Terceiro aprovado pelo "105 Congresso Base das Normativas
Internacionais sobre o SPAM". Este E-mail não poderá ser considerado SPAM
quando inclua uma forma de ser removido.</font>

</a>
</center>

</body>
</html>


From noreply@sourceforge.net  Mon Mar  4 22:55:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 14:55:28 -0800
Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL
Message-ID: <E16i1Me-0002s4-00@usw-sf-web3.sourceforge.net>

Patches item #524327, was opened at 2002-03-02 00:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Tino Lange (tinolange)
Assigned to: Piers Lauder (pierslauder)
Summary: imaplib.py and SSL

Initial Comment:
Hallo!

Our company has decided to allow only SSL connections 
to the e-mailbox from outside. So I needed a SSL 
capable "imaplib.py" to run my mailwatcher-scripts 
from home.

Thanks to the socket.ssl() in recent Pythons it was 
nearly no problem to derive an IMAP4_SSL-class from 
the existing IMAP4-class in Python's standard library.

Maybe you want to look over the very small additions 
that were necessary to implement the IMAP-over-SSL-
functionality and add it as a part of the next 
official "imaplib.py"?

Here's the context diff from the most recent CVS 
version (1.43). It works fine for me this way and it's 
only a few straight-forward lines of code.

Maybe I could contribute a bit to the Python project 
with this patch?

Best regards

Tino Lange


----------------------------------------------------------------------

>Comment By: Piers Lauder (pierslauder)
Date: 2002-03-05 09:55

Message:
Logged In: YES 
user_id=196212

Ok, (the boring bit :-) please provide a matching patch for
the documentation (in dist/src/Doc/lib/libimaplib.tex), and
I'll install both patches. Thanks Tino!


----------------------------------------------------------------------

Comment By: Tino Lange (tinolange)
Date: 2002-03-04 21:55

Message:
Logged In: YES 
user_id=212920

Hallo!

socket.ssl() -Objects only have _two_ methods
read()
write()

I don't know how they handle write() internally - whether
they use a send() or a sendall() equivalent for the
underlying socket call. I didn't look in the C sources for
that.

That's also why I had to code the readline() by hand in the
while-loop, because socket.ssl() - Objects only have read(),
no readline().

But the implementation works quite fine (by the way also
under Windows after replacing the _socket.pyd with an SSL
enabled one). 

Best regards

Tino


----------------------------------------------------------------------

Comment By: Piers Lauder (pierslauder)
Date: 2002-03-04 16:47

Message:
Logged In: YES 
user_id=196212

This seems fine to me, but i can't test it as i don't have
access to an ssl-enabled imapd. My only caveat is - do
socket.ssl objects have a "sendall" method? - in which case
that is what should be used in the send method.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 02:59:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 04 Mar 2002 18:59:17 -0800
Subject: [Patches] [ python-Patches-525763 ] minor fix for regen on IRIX
Message-ID: <E16i5Ab-0008RB-00@usw-sf-web2.sourceforge.net>

Patches item #525763, was opened at 2002-03-04 18:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Pruett (mpruett)
Assigned to: Nobody/Anonymous (nobody)
Summary: minor fix for regen on IRIX

Initial Comment:
The Lib/plat-irix6/regen script does not catch IRIX 6
(only IRIX 4 and 5), and it doesn't handle systems
which report themselves as running 'IRIX64' rather than
just 'IRIX'.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 08:58:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 00:58:12 -0800
Subject: [Patches] [ python-Patches-525870 ] urllib2: duplicate call, stat attrs
Message-ID: <E16iAlw-0001Oe-00@usw-sf-web1.sourceforge.net>

Patches item #525870, was opened at 2002-03-05 09:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2: duplicate call, stat attrs

Initial Comment:
This patch removes a duplicate call to os.stat in 
urllib2.FileHandler.open_local_file()

In addition to that, it uses the new stat attributes, 
so importing stat is no longer neccessary.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 13:45:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 05:45:16 -0800
Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results
Message-ID: <E16iFFk-0007kk-00@usw-sf-web2.sourceforge.net>

Patches item #462296, was opened at 2001-09-17 17:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-05 13:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 20:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-18 22:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 20:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 19:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 13:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 13:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 01:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 15:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 14:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 02:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 14:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 14:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 04:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 08:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 18:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 04:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 18:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 18:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 06:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 21:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 07:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 01:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 23:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-17 22:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 20:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 13:59:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 05:59:29 -0800
Subject: [Patches] [ python-Patches-525945 ] urllib: Defering open call for file urls
Message-ID: <E16iFTV-0007xh-00@usw-sf-web2.sourceforge.net>

Patches item #525945, was opened at 2002-03-05 14:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib: Defering open call for file urls

Initial Comment:
This patch changes the handling of local files in 
urllib.urlopen() and urllib2.urlopen(). Opening the 
file is deferred until the first time read(), readline
(), readlines() or fileno() is called.

This makes it possible to retrieve the header 
information for all URLs via urlopen in a uniform way, 
without actually having to open the file.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 14:09:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 06:09:41 -0800
Subject: [Patches] [ python-Patches-525945 ] urllib: Defering open call for file urls
Message-ID: <E16iFdN-0005G7-00@usw-sf-web1.sourceforge.net>

Patches item #525945, was opened at 2002-03-05 08:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib: Defering open call for file urls

Initial Comment:
This patch changes the handling of local files in 
urllib.urlopen() and urllib2.urlopen(). Opening the 
file is deferred until the first time read(), readline
(), readlines() or fileno() is called.

This makes it possible to retrieve the header 
information for all URLs via urlopen in a uniform way, 
without actually having to open the file.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-05 09:09

Message:
Logged In: YES 
user_id=6380

I don't understand.  Can you explain why you care about
this?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 14:21:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 06:21:28 -0800
Subject: [Patches] [ python-Patches-525945 ] urllib: Defering open call for file urls
Message-ID: <E16iFom-0005aI-00@usw-sf-web3.sourceforge.net>

Patches item #525945, was opened at 2002-03-05 14:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib: Defering open call for file urls

Initial Comment:
This patch changes the handling of local files in 
urllib.urlopen() and urllib2.urlopen(). Opening the 
file is deferred until the first time read(), readline
(), readlines() or fileno() is called.

This makes it possible to retrieve the header 
information for all URLs via urlopen in a uniform way, 
without actually having to open the file.


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-05 15:21

Message:
Logged In: YES 
user_id=89016

I'm currently writing a make in Python. This make should be 
able to handle not only local files, but remote files 
(http, ftp, etc.). One project might have several thousand 
targets, and some of them are remote. I want to be able to 
handle both types in a uniform way, i.e. via 
urllib/urllib2. This means, that I call urllib2.urlopen() 
to get the header information about the last modification 
date, but I don't want to open the file right away. Only 
when the data is required (because the source resource is 
newer than the target) should the file be read.

And this might open the door to making streams that are 
returned from urlopen() writable (simply by using open
(..., "wb") instead of open(..., "rb") when the first write 
is called.

Another possibility might be using urllib.urlretrieve(), but
the API is horrible (one global cleanup function) and not 
supported by urllib2.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-05 15:09

Message:
Logged In: YES 
user_id=6380

I don't understand.  Can you explain why you care about
this?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 15:22:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 07:22:51 -0800
Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results
Message-ID: <E16iGmB-0006Mb-00@usw-sf-web1.sourceforge.net>

Patches item #462296, was opened at 2001-09-17 19:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 16:22

Message:
Logged In: YES 
user_id=21627

Adding all fields is both difficult and undesirable. It is
difficult because you may not know in advance what fields
will be added in future versions, and it is undesirable
because applications may think that there is a value even
though the is none.

What problem does that cause for pickling, and why would a
complete list of all attributes solve this problem?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 14:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 21:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-19 00:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 22:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 21:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 15:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 15:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 03:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 17:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 16:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 04:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 06:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 10:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 22:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 06:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 20:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 20:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 08:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 23:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 09:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 03:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 02:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 01:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 00:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 15:50:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 07:50:59 -0800
Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results
Message-ID: <E16iHDP-0006vr-00@usw-sf-web3.sourceforge.net>

Patches item #462296, was opened at 2001-09-17 17:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-05 15:50

Message:
Logged In: YES 
user_id=6656

I'm not worried about cross version problems.

The problem with pickling is that stat_results (as of today)
get pickled as "os.stat_result" and a tuple of arguments. 
The number of arguments os.stat_result takes varies by
platform (it seems to be 10 on this NT box, but it's 13 on
the starship, f'ex).  So if a stat_result gets pickled on
the starship and shoved down a socket to an NT machine, it
can't be unpickled.  I don't know if this sort of thing ever
happens, but I could see it being surprising & annoying if I
ran into it.

If os.stat_result took 13 arguments everywhere, this problem
obviously wouldn't arise.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 15:22

Message:
Logged In: YES 
user_id=21627

Adding all fields is both difficult and undesirable. It is
difficult because you may not know in advance what fields
will be added in future versions, and it is undesirable
because applications may think that there is a value even
though the is none.

What problem does that cause for pickling, and why would a
complete list of all attributes solve this problem?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 13:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 20:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-18 22:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 20:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 19:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 13:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 13:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 01:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 15:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 14:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 02:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 14:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 14:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 04:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 08:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 18:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 04:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 18:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 18:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 06:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 21:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 07:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 01:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 23:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-17 22:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 20:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 16:15:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 08:15:16 -0800
Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results
Message-ID: <E16iHau-00072o-00@usw-sf-web1.sourceforge.net>

Patches item #462296, was opened at 2001-09-17 19:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 17:15

Message:
Logged In: YES 
user_id=21627

To support pickling, I think structseq objects should
implement a __reduce__ method, returning the type and a
dictionary. The type's tp_new should accept dictionaries,
and reconstruct the instance from the dictionary.

Alternatively, copy_reg could grow support for stat_result,
which seems desirable anyway, since os.stat returns a
'nt.stat_result' instance on Windows.

Furthermore, fixing the number of arguments does not help at
all in pickling; __reduce__ will return an argument tuple
which includes the original object; in turn, pickle will
recurse until the stack overflows.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 16:50

Message:
Logged In: YES 
user_id=6656

I'm not worried about cross version problems.

The problem with pickling is that stat_results (as of today)
get pickled as "os.stat_result" and a tuple of arguments. 
The number of arguments os.stat_result takes varies by
platform (it seems to be 10 on this NT box, but it's 13 on
the starship, f'ex).  So if a stat_result gets pickled on
the starship and shoved down a socket to an NT machine, it
can't be unpickled.  I don't know if this sort of thing ever
happens, but I could see it being surprising & annoying if I
ran into it.

If os.stat_result took 13 arguments everywhere, this problem
obviously wouldn't arise.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 16:22

Message:
Logged In: YES 
user_id=21627

Adding all fields is both difficult and undesirable. It is
difficult because you may not know in advance what fields
will be added in future versions, and it is undesirable
because applications may think that there is a value even
though the is none.

What problem does that cause for pickling, and why would a
complete list of all attributes solve this problem?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 14:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 21:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-19 00:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 22:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 21:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 15:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 15:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 03:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 17:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 16:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 04:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 06:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 10:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 22:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 06:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 20:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 20:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 08:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 23:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 09:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 03:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 02:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 01:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 00:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 16:32:53 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 08:32:53 -0800
Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results
Message-ID: <E16iHrx-0007BD-00@usw-sf-web1.sourceforge.net>

Patches item #462296, was opened at 2001-09-17 17:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-05 16:32

Message:
Logged In: YES 
user_id=6656

Martin, I may not have been 100% clear in my last note, but
please run

cvs up Objects/structseq.c

structseq objects *do* now implement a __reduce__ method,
but it returns a tuple.  Using a dictionary would be more
complicated, and not solve the issue completely: what
happens when you go from a platform with less fields to one
with more?  What value does the not-prepared-for field have?

Hmm, the point about nt.stat_result is a good one.

Getting support into copy_reg.py leads to interesting
bootstrapping problems when using uninstalled builds,
unfortunately (site.py imports distutils imports re imports
copy_reg; try to import, say, time, and you can't, because
the whole reason to import distutils was to set up the path
to find dynamically linked libraries...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 16:15

Message:
Logged In: YES 
user_id=21627

To support pickling, I think structseq objects should
implement a __reduce__ method, returning the type and a
dictionary. The type's tp_new should accept dictionaries,
and reconstruct the instance from the dictionary.

Alternatively, copy_reg could grow support for stat_result,
which seems desirable anyway, since os.stat returns a
'nt.stat_result' instance on Windows.

Furthermore, fixing the number of arguments does not help at
all in pickling; __reduce__ will return an argument tuple
which includes the original object; in turn, pickle will
recurse until the stack overflows.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 15:50

Message:
Logged In: YES 
user_id=6656

I'm not worried about cross version problems.

The problem with pickling is that stat_results (as of today)
get pickled as "os.stat_result" and a tuple of arguments. 
The number of arguments os.stat_result takes varies by
platform (it seems to be 10 on this NT box, but it's 13 on
the starship, f'ex).  So if a stat_result gets pickled on
the starship and shoved down a socket to an NT machine, it
can't be unpickled.  I don't know if this sort of thing ever
happens, but I could see it being surprising & annoying if I
ran into it.

If os.stat_result took 13 arguments everywhere, this problem
obviously wouldn't arise.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 15:22

Message:
Logged In: YES 
user_id=21627

Adding all fields is both difficult and undesirable. It is
difficult because you may not know in advance what fields
will be added in future versions, and it is undesirable
because applications may think that there is a value even
though the is none.

What problem does that cause for pickling, and why would a
complete list of all attributes solve this problem?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 13:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 20:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-18 22:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 20:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 19:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 13:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 13:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 01:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 15:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 14:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 02:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 14:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 14:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 04:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 08:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 18:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 04:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 18:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 18:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 06:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 21:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 07:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 01:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 23:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-17 22:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 20:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 16:46:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 08:46:33 -0800
Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results
Message-ID: <E16iI5B-0007Ka-00@usw-sf-web1.sourceforge.net>

Patches item #462296, was opened at 2001-09-17 19:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 17:46

Message:
Logged In: YES 
user_id=21627

I'd not put the copyreg support into copy_reg, but into
os.py. Pickling would save a reference to
os._load_stat_result (or some such). When pickle tries to
restore the value, it would first restore
os.load_stat_result. For that, it would import os, which
would register the copy_reg support.

As for constructing structseq objects from dictionaries: it
would be a ValueError if fields within [:n_sequence_fields]
are not filled out; leaving out other fields is fine.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 17:32

Message:
Logged In: YES 
user_id=6656

Martin, I may not have been 100% clear in my last note, but
please run

cvs up Objects/structseq.c

structseq objects *do* now implement a __reduce__ method,
but it returns a tuple.  Using a dictionary would be more
complicated, and not solve the issue completely: what
happens when you go from a platform with less fields to one
with more?  What value does the not-prepared-for field have?

Hmm, the point about nt.stat_result is a good one.

Getting support into copy_reg.py leads to interesting
bootstrapping problems when using uninstalled builds,
unfortunately (site.py imports distutils imports re imports
copy_reg; try to import, say, time, and you can't, because
the whole reason to import distutils was to set up the path
to find dynamically linked libraries...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 17:15

Message:
Logged In: YES 
user_id=21627

To support pickling, I think structseq objects should
implement a __reduce__ method, returning the type and a
dictionary. The type's tp_new should accept dictionaries,
and reconstruct the instance from the dictionary.

Alternatively, copy_reg could grow support for stat_result,
which seems desirable anyway, since os.stat returns a
'nt.stat_result' instance on Windows.

Furthermore, fixing the number of arguments does not help at
all in pickling; __reduce__ will return an argument tuple
which includes the original object; in turn, pickle will
recurse until the stack overflows.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 16:50

Message:
Logged In: YES 
user_id=6656

I'm not worried about cross version problems.

The problem with pickling is that stat_results (as of today)
get pickled as "os.stat_result" and a tuple of arguments. 
The number of arguments os.stat_result takes varies by
platform (it seems to be 10 on this NT box, but it's 13 on
the starship, f'ex).  So if a stat_result gets pickled on
the starship and shoved down a socket to an NT machine, it
can't be unpickled.  I don't know if this sort of thing ever
happens, but I could see it being surprising & annoying if I
ran into it.

If os.stat_result took 13 arguments everywhere, this problem
obviously wouldn't arise.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 16:22

Message:
Logged In: YES 
user_id=21627

Adding all fields is both difficult and undesirable. It is
difficult because you may not know in advance what fields
will be added in future versions, and it is undesirable
because applications may think that there is a value even
though the is none.

What problem does that cause for pickling, and why would a
complete list of all attributes solve this problem?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 14:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 21:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-19 00:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 22:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 21:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 15:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 15:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 03:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 17:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 16:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 04:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 06:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 10:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 22:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 06:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 20:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 20:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 08:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 23:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 09:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 03:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 02:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 01:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 00:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 16:49:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 08:49:15 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16iI7n-0007MX-00@usw-sf-web1.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 13:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 16:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 10:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 10:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 03:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 17:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 11:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 11:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 12:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 20:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 17:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 13:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 08:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 19:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 18:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 16:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 16:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 14:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 17:49:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 09:49:47 -0800
Subject: [Patches] [ python-Patches-415226 ] new base class for binary packaging
Message-ID: <E16iJ4N-0007rF-00@usw-sf-web3.sourceforge.net>

Patches item #415226, was opened at 2001-04-10 19:51
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=415226&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
>Assigned to: M.-A. Lemburg (lemburg)
Summary: new base class for binary packaging

Initial Comment:
bdist_packager.py provides an abstract
base class for bdist commands. It provides easy access
to all 
the PEP 241 metadata fields, plus "revision" for the
package
revision and installation scripts for preinstall,
postinstall
preremove, and postremove. That covers the base
characteristics
of all the package managers that I'm familiar with. If
anyone
can think of any others, let me know, otherwise
additional
extensions would be implemented in the specific
packager's
commands. I would, however, discourage _requiring_ any
additional fields. It would be nice if by simply
supplying
the PEP241 metadata under the [bdist_packager] section 
all subclassed packagers worked with no further effort.
It also has rudimentary relocation support by including
a --no-autorelocate option. 

The bdist_packager is also where I see creating
seperate
binary packages for sub-packages supported. My need for 
that is much less than my desire for it right now, so I
didn't give it much thought as I wrote it. I'd be
delighted
to hear any comments and suggestions on how to approach
sub-packaging, though.

----------------------------------------------------------------------

Comment By: Mark Alexander (mwa)
Date: 2001-10-02 21:10

Message:
Logged In: YES 
user_id=12810

Regarding script code: The preinstall, postinstall, etc.
scripts are hooked into the package manager specific
subclasses. It's the responsibility of the specific class to
"do the right thing". For *NIX package managers, this is
usually script code, although changing the help text to be
more informative isn't a problem. 

More specifically, using python scripts under pkgtool and
sdux would fail. Install scripts are not executed, they're
sourced (in some wierd fashion I've yet to identify).
Theoretically, using a shell script to find the python
interpreter by querying the package manager and calling it
with either -i or a runtime created script should work fine.

This is intended as a class for instantiating new bdist
commands with full support for pep 241. Current bdist
commands do their own thing, and they do it very
differently. I'd rather see this put in as a migration path
than shut down bdist commands that function just fine on
their own. Eventual adoption of a standard abstract base
would mean that module authors could provide all metadata in
a standard format, and distutils would be able to create
binary packages for systems the author doesn't have access
to. 

This works for Solaris pkgtool and HP-UX SDUX. All three
patches can be included with ZERO side effects on any other
aspect of Distutils. I'm really kind of curious why they're
not integrated yet so other's can try them out.

----------------------------------------------------------------------

Comment By: david arnold (dja)
Date: 2001-09-20 09:08

Message:
Logged In: YES 
user_id=78574

i recently struck a case where i wanted the ability to run a
post-install script on Windows (from a
bdist_wininst-produced package).

while i agree with what seems to be the basic intention of
this patch, wouldn't it be more useful to have the various
scripts run by the Python interpreter, rather than Bourne
shell (which is extremely seldom available on Windows,
MacOS, etc) ?

i went looking for the source of the .exe file embedded in
the wininst command, but couldn't find it.  does anyone know
where it lives?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-06-07 05:33

Message:
Logged In: YES 
user_id=21627

Shouldn't the patch also modify the existing bdist 
commands to use this as a base class?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=415226&group_id=5470


From noreply@sourceforge.net  Tue Mar  5 19:00:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 05 Mar 2002 11:00:47 -0800
Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II
Message-ID: <E16iKB5-0008WO-00@usw-sf-web3.sourceforge.net>

Patches item #526072, was opened at 2002-03-05 19:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Martin v. Löwis (loewis)
Summary: pickling os.stat results round II

Initial Comment:
Following discussion in patch #462296, I've tried to
implement what Martin suggested, i.e.

1) structseq's contructors now take an additional,
optional second argument which should be a dictionary.
 If any of the "invisible" fields are not specified by
the sequence first argument, their values are looked
for in this dict (if not found, None is used).  Extra
keys are ignored.

2) structseq's __reduce__ methods return invisible
fields in a dict.

3) I also fix the bug I just submitted, namely #526039.

Martin, can you look the code over?  I'm not sure it's
maximally-sensibly written.

WRT the finding-the-type-object issue: how about making
os.stat_result.__name__ == "os.stat_result" rather than
"posix.stat_result".

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 11:04:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 03:04:19 -0800
Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II
Message-ID: <E16iZDX-000428-00@usw-sf-web1.sourceforge.net>

Patches item #526072, was opened at 2002-03-05 20:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Martin v. Löwis (loewis)
Summary: pickling os.stat results round II

Initial Comment:
Following discussion in patch #462296, I've tried to
implement what Martin suggested, i.e.

1) structseq's contructors now take an additional,
optional second argument which should be a dictionary.
 If any of the "invisible" fields are not specified by
the sequence first argument, their values are looked
for in this dict (if not found, None is used).  Extra
keys are ignored.

2) structseq's __reduce__ methods return invisible
fields in a dict.

3) I also fix the bug I just submitted, namely #526039.

Martin, can you look the code over?  I'm not sure it's
maximally-sensibly written.

WRT the finding-the-type-object issue: how about making
os.stat_result.__name__ == "os.stat_result" rather than
"posix.stat_result".

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 12:04

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 11:13:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 03:13:35 -0800
Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II
Message-ID: <E16iZMV-0004KF-00@usw-sf-web2.sourceforge.net>

Patches item #526072, was opened at 2002-03-05 19:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Martin v. Löwis (loewis)
Summary: pickling os.stat results round II

Initial Comment:
Following discussion in patch #462296, I've tried to
implement what Martin suggested, i.e.

1) structseq's contructors now take an additional,
optional second argument which should be a dictionary.
 If any of the "invisible" fields are not specified by
the sequence first argument, their values are looked
for in this dict (if not found, None is used).  Extra
keys are ignored.

2) structseq's __reduce__ methods return invisible
fields in a dict.

3) I also fix the bug I just submitted, namely #526039.

Martin, can you look the code over?  I'm not sure it's
maximally-sensibly written.

WRT the finding-the-type-object issue: how about making
os.stat_result.__name__ == "os.stat_result" rather than
"posix.stat_result".

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-06 11:13

Message:
Logged In: YES 
user_id=6656

Oops, how embarrassing.

I don't think I can blame sf for this one -- I think I just
forgot.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 11:04

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 11:17:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 03:17:55 -0800
Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II
Message-ID: <E16iZQh-0004Bv-00@usw-sf-web1.sourceforge.net>

Patches item #526072, was opened at 2002-03-05 19:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Martin v. Löwis (loewis)
Summary: pickling os.stat results round II

Initial Comment:
Following discussion in patch #462296, I've tried to
implement what Martin suggested, i.e.

1) structseq's contructors now take an additional,
optional second argument which should be a dictionary.
 If any of the "invisible" fields are not specified by
the sequence first argument, their values are looked
for in this dict (if not found, None is used).  Extra
keys are ignored.

2) structseq's __reduce__ methods return invisible
fields in a dict.

3) I also fix the bug I just submitted, namely #526039.

Martin, can you look the code over?  I'm not sure it's
maximally-sensibly written.

WRT the finding-the-type-object issue: how about making
os.stat_result.__name__ == "os.stat_result" rather than
"posix.stat_result".

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-06 11:17

Message:
Logged In: YES 
user_id=6656

I forgot a Py_DECREF.  Look at the -pickle3.diff file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-06 11:13

Message:
Logged In: YES 
user_id=6656

Oops, how embarrassing.

I don't think I can blame sf for this one -- I think I just
forgot.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 11:04

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 12:16:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 04:16:33 -0800
Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II
Message-ID: <E16iaLR-00057s-00@usw-sf-web3.sourceforge.net>

Patches item #526072, was opened at 2002-03-05 20:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470

Category: Core (C code)
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Michael Hudson (mwh)
>Assigned to: Michael Hudson (mwh)
Summary: pickling os.stat results round II

Initial Comment:
Following discussion in patch #462296, I've tried to
implement what Martin suggested, i.e.

1) structseq's contructors now take an additional,
optional second argument which should be a dictionary.
 If any of the "invisible" fields are not specified by
the sequence first argument, their values are looked
for in this dict (if not found, None is used).  Extra
keys are ignored.

2) structseq's __reduce__ methods return invisible
fields in a dict.

3) I also fix the bug I just submitted, namely #526039.

Martin, can you look the code over?  I'm not sure it's
maximally-sensibly written.

WRT the finding-the-type-object issue: how about making
os.stat_result.__name__ == "os.stat_result" rather than
"posix.stat_result".

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 13:16

Message:
Logged In: YES 
user_id=21627

The patch looks ok to me. Renaming the type to
os.stat_result is one option; the other option is to add a
function os._make_stat_result, and have __reduce__ return
this (much like object.__reduce__ returns copy_reg._reduce).
Chose whichever you like more.

[the missing-upload text is a canned response; I didn't
actually type all that :-]

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-06 12:17

Message:
Logged In: YES 
user_id=6656

I forgot a Py_DECREF.  Look at the -pickle3.diff file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-06 12:13

Message:
Logged In: YES 
user_id=6656

Oops, how embarrassing.

I don't think I can blame sf for this one -- I think I just
forgot.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 12:04

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 17:03:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 09:03:13 -0800
Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files
Message-ID: <E16ieor-0000MS-00@usw-sf-web3.sourceforge.net>

Patches item #523944, was opened at 2002-02-28 10:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Mitch Chapman (mitchchapman)
>Assigned to: M.-A. Lemburg (lemburg)
>Summary: imputil.py can't import "\r\n" .py files

Initial Comment:
__builtin__.compile() requires that codestring line endings consist of
"\n".  imputil._compile() does not enforce this.  One result is that
imputil may be unable to import modules created on Win32.

The attached patch to the latest (CVS revision 1.23) imputil.py
replaces both "\r\n" and "\r" with "\n" before passing a code string
to __builtin__.compile().  This is consistent with the behavior of
e.g. Lib/py_compile.py.


----------------------------------------------------------------------

>Comment By: Mitch Chapman (mitchchapman)
Date: 2002-03-06 10:03

Message:
Logged In: YES 
user_id=348188

Please pardon if it's inappropriate to assign patches to project developers.
I'm doing so on the advice of a post by Skip Montanaro.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 17:14:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 09:14:11 -0800
Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II
Message-ID: <E16iezT-0000T7-00@usw-sf-web3.sourceforge.net>

Patches item #526072, was opened at 2002-03-05 19:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: Michael Hudson (mwh)
Summary: pickling os.stat results round II

Initial Comment:
Following discussion in patch #462296, I've tried to
implement what Martin suggested, i.e.

1) structseq's contructors now take an additional,
optional second argument which should be a dictionary.
 If any of the "invisible" fields are not specified by
the sequence first argument, their values are looked
for in this dict (if not found, None is used).  Extra
keys are ignored.

2) structseq's __reduce__ methods return invisible
fields in a dict.

3) I also fix the bug I just submitted, namely #526039.

Martin, can you look the code over?  I'm not sure it's
maximally-sensibly written.

WRT the finding-the-type-object issue: how about making
os.stat_result.__name__ == "os.stat_result" rather than
"posix.stat_result".

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-06 17:14

Message:
Logged In: YES 
user_id=6656

Checked in this patch as 
Objects/structseq.c revision 1.5.

Custom pickle method for stat_results (and statvfs_results) in 
Lib/os.py revision 1.52
(used an approach roughly like your suggestion -- which
isn't what object.__reduce__ does, I think).

Tests in 
Lib/test/pickletester.py revision 1.14

I know about the canned response; I've been caught out by it
on occasion but this time it was just me being dense.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 12:16

Message:
Logged In: YES 
user_id=21627

The patch looks ok to me. Renaming the type to
os.stat_result is one option; the other option is to add a
function os._make_stat_result, and have __reduce__ return
this (much like object.__reduce__ returns copy_reg._reduce).
Chose whichever you like more.

[the missing-upload text is a canned response; I didn't
actually type all that :-]

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-06 11:17

Message:
Logged In: YES 
user_id=6656

I forgot a Py_DECREF.  Look at the -pickle3.diff file.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-06 11:13

Message:
Logged In: YES 
user_id=6656

Oops, how embarrassing.

I don't think I can blame sf for this one -- I think I just
forgot.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 11:04

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470


From noreply@sourceforge.net  Wed Mar  6 17:14:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 09:14:22 -0800
Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files
Message-ID: <E16ieze-0000TB-00@usw-sf-web3.sourceforge.net>

Patches item #523944, was opened at 2002-02-28 17:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Mitch Chapman (mitchchapman)
>Assigned to: Greg Stein (gstein)
>Summary: imputil.py can't import "\r\n" .py files

Initial Comment:
__builtin__.compile() requires that codestring line endings consist of
"\n".  imputil._compile() does not enforce this.  One result is that
imputil may be unable to import modules created on Win32.

The attached patch to the latest (CVS revision 1.23) imputil.py
replaces both "\r\n" and "\r" with "\n" before passing a code string
to __builtin__.compile().  This is consistent with the behavior of
e.g. Lib/py_compile.py.


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-06 17:14

Message:
Logged In: YES 
user_id=38388

Assigning to Greg Stein -- imputil.py is his baby.

----------------------------------------------------------------------

Comment By: Mitch Chapman (mitchchapman)
Date: 2002-03-06 17:03

Message:
Logged In: YES 
user_id=348188

Please pardon if it's inappropriate to assign patches to project developers.
I'm doing so on the advice of a post by Skip Montanaro.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 01:29:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 06 Mar 2002 17:29:47 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16imj5-00063f-00@usw-sf-web2.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 09:11:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 01:11:45 -0800
Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation
Message-ID: <E16itw9-00035w-00@usw-sf-web3.sourceforge.net>

Patches item #526840, was opened at 2002-03-07 09:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PEP 263 Implementation

Initial Comment:
The attached patch implements PEP 263. The following
differences to the PEP (rev. 1.8) are known:

- The implementation interprets "ASCII compatible" as
meaning "bytes below 128 always denote ASCII
characters", although this property is only used for
",', and \. There have been other readings of "ASCII
compatible", so this should probably be elaborated in
the PEP.

- The check whether all bytes follow the declared or
system encoding (including comments and string
literals) is only performed if the encoding is "ascii".


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 10:11

Message:
Logged In: YES 
user_id=21627

A note on the implementation strategy: it turned out that
communicating the encoding into the abstract syntax was the
biggest challenge. 

To solve this, I introduced encoding_decl pseudo node: it is
an unused non-terminal whose STR() is the encoding, and
whose only child is the true root of the syntax tree. As
such, it is the only non-terminal which has a STR value.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 11:06:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 03:06:43 -0800
Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation
Message-ID: <E16ivjP-0003pU-00@usw-sf-web1.sourceforge.net>

Patches item #526840, was opened at 2002-03-07 08:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
>Priority: 7
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PEP 263 Implementation

Initial Comment:
The attached patch implements PEP 263. The following
differences to the PEP (rev. 1.8) are known:

- The implementation interprets "ASCII compatible" as
meaning "bytes below 128 always denote ASCII
characters", although this property is only used for
",', and \. There have been other readings of "ASCII
compatible", so this should probably be elaborated in
the PEP.

- The check whether all bytes follow the declared or
system encoding (including comments and string
literals) is only performed if the encoding is "ascii".


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 11:06

Message:
Logged In: YES 
user_id=38388

Thank you !

I'll add a note to the PEP about the way the first two lines
are processed (removing the ASCII mention...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 09:11

Message:
Logged In: YES 
user_id=21627

A note on the implementation strategy: it turned out that
communicating the encoding into the abstract syntax was the
biggest challenge. 

To solve this, I introduced encoding_decl pseudo node: it is
an unused non-terminal whose STR() is the encoding, and
whose only child is the true root of the syntax tree. As
such, it is the only non-terminal which has a STR value.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 14:06:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 06:06:10 -0800
Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation
Message-ID: <E16iyX4-0006Aq-00@usw-sf-web2.sourceforge.net>

Patches item #526840, was opened at 2002-03-07 03:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470

Category: Parser/Compiler
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PEP 263 Implementation

Initial Comment:
The attached patch implements PEP 263. The following
differences to the PEP (rev. 1.8) are known:

- The implementation interprets "ASCII compatible" as
meaning "bytes below 128 always denote ASCII
characters", although this property is only used for
",', and \. There have been other readings of "ASCII
compatible", so this should probably be elaborated in
the PEP.

- The check whether all bytes follow the declared or
system encoding (including comments and string
literals) is only performed if the encoding is "ascii".


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 09:06

Message:
Logged In: YES 
user_id=6380

I've set the group to Python 2.3 so the priority has some
context (I'd rather you move the priority down to 5 but I
understand this is your personal priority).

I haven't accepted the PEP yet (although I expect I will),
so please don't check this in yet (if you feel it needs to
be saved in CVS, use a branch).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 06:06

Message:
Logged In: YES 
user_id=38388

Thank you !

I'll add a note to the PEP about the way the first two lines
are processed (removing the ASCII mention...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 04:11

Message:
Logged In: YES 
user_id=21627

A note on the implementation strategy: it turned out that
communicating the encoding into the abstract syntax was the
biggest challenge. 

To solve this, I introduced encoding_decl pseudo node: it is
an unused non-terminal whose STR() is the encoding, and
whose only child is the true root of the syntax tree. As
such, it is the only non-terminal which has a STR value.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 16:45:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 08:45:13 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16j10z-0007fw-00@usw-sf-web1.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 18:01:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 10:01:05 -0800
Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation
Message-ID: <E16j2CP-00039c-00@usw-sf-web2.sourceforge.net>

Patches item #526840, was opened at 2002-03-07 08:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PEP 263 Implementation

Initial Comment:
The attached patch implements PEP 263. The following
differences to the PEP (rev. 1.8) are known:

- The implementation interprets "ASCII compatible" as
meaning "bytes below 128 always denote ASCII
characters", although this property is only used for
",', and \. There have been other readings of "ASCII
compatible", so this should probably be elaborated in
the PEP.

- The check whether all bytes follow the declared or
system encoding (including comments and string
literals) is only performed if the encoding is "ascii".


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 18:01

Message:
Logged In: YES 
user_id=38388

Ok, I've had a look at the patch. 

It looks good except for the overly 
complicated implementation of the 
unicode-escape codec. 

Even though there's a bit of code duplication, 
I'd prefer to have two separate functions here: 
one for the standard char* pointer type and 
another one for Py_UNICODE*, ie.

PyUnicode_DecodeUnicodeEscape(char*...)
and
PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...)

This is easier to support and gives better
performance since the compiler can optimize
the two functions making different 
assumptions.

You'll also need to include a name mangling
at the top of the header for the new API.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 14:06

Message:
Logged In: YES 
user_id=6380

I've set the group to Python 2.3 so the priority has some
context (I'd rather you move the priority down to 5 but I
understand this is your personal priority).

I haven't accepted the PEP yet (although I expect I will),
so please don't check this in yet (if you feel it needs to
be saved in CVS, use a branch).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 11:06

Message:
Logged In: YES 
user_id=38388

Thank you !

I'll add a note to the PEP about the way the first two lines
are processed (removing the ASCII mention...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 09:11

Message:
Logged In: YES 
user_id=21627

A note on the implementation strategy: it turned out that
communicating the encoding into the abstract syntax was the
biggest challenge. 

To solve this, I introduced encoding_decl pseudo node: it is
an unused non-terminal whose STR() is the encoding, and
whose only child is the true root of the syntax tree. As
such, it is the only non-terminal which has a STR value.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 18:24:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 10:24:43 -0800
Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation
Message-ID: <E16j2ZH-0000Gj-00@usw-sf-web1.sourceforge.net>

Patches item #526840, was opened at 2002-03-07 09:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PEP 263 Implementation

Initial Comment:
The attached patch implements PEP 263. The following
differences to the PEP (rev. 1.8) are known:

- The implementation interprets "ASCII compatible" as
meaning "bytes below 128 always denote ASCII
characters", although this property is only used for
",', and \. There have been other readings of "ASCII
compatible", so this should probably be elaborated in
the PEP.

- The check whether all bytes follow the declared or
system encoding (including comments and string
literals) is only performed if the encoding is "ascii".


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:24

Message:
Logged In: YES 
user_id=21627

Changing the decoding functions will not result in one
additional function, but in two of them: you'll also get
PyUnicode_DecodeRawUnicodeEscapeFromUnicode.

That seems quite unmaintainable to me: any change now needs
to propagate into four functions. OTOH, I don't think that
the code that allows parsing a variable-sized strings is
overly complicated.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 19:01

Message:
Logged In: YES 
user_id=38388

Ok, I've had a look at the patch. 

It looks good except for the overly 
complicated implementation of the 
unicode-escape codec. 

Even though there's a bit of code duplication, 
I'd prefer to have two separate functions here: 
one for the standard char* pointer type and 
another one for Py_UNICODE*, ie.

PyUnicode_DecodeUnicodeEscape(char*...)
and
PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...)

This is easier to support and gives better
performance since the compiler can optimize
the two functions making different 
assumptions.

You'll also need to include a name mangling
at the top of the header for the new API.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 15:06

Message:
Logged In: YES 
user_id=6380

I've set the group to Python 2.3 so the priority has some
context (I'd rather you move the priority down to 5 but I
understand this is your personal priority).

I haven't accepted the PEP yet (although I expect I will),
so please don't check this in yet (if you feel it needs to
be saved in CVS, use a branch).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 12:06

Message:
Logged In: YES 
user_id=38388

Thank you !

I'll add a note to the PEP about the way the first two lines
are processed (removing the ASCII mention...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 10:11

Message:
Logged In: YES 
user_id=21627

A note on the implementation strategy: it turned out that
communicating the encoding into the abstract syntax was the
biggest challenge. 

To solve this, I introduced encoding_decl pseudo node: it is
an unused non-terminal whose STR() is the encoding, and
whose only child is the true root of the syntax tree. As
such, it is the only non-terminal which has a STR value.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 18:41:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 10:41:21 -0800
Subject: [Patches] [ python-Patches-401022 ] Removal of SET_LINENO (experimental)
Message-ID: <E16j2pN-0000xP-00@usw-sf-web3.sourceforge.net>

Patches item #401022, was opened at 2000-07-30 23:08
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=401022&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Out of Date
Priority: 5
Submitted By: Vladimir Marangozov (marangoz)
Assigned to: Nobody/Anonymous (nobody)
Summary: Removal of SET_LINENO (experimental)

Initial Comment:
 

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-07 18:41

Message:
Logged In: YES 
user_id=35752

I worked a bit on porting to this patch to 2.2+ CVS.  I
ran into a snag with generators.  Generators save the
instruction pointer (i.e. the bytecode offset) on yield.
That makes the on-the-fly bytecode translation approach
more complicated.

Since Guido is going to redesign the whole VM it's probably
not work spending any more effort on this. :-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-11-27 21:54

Message:
Logged In: YES 
user_id=31435

Unassigned again -- I'm not gonna get to this in this 
lifetime.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-10 18:51

Message:
Logged In: YES 
user_id=6380

Tim wants to revisit this. It could be the quickest way to a
7% speedup in pystone that we can think of...

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2000-11-13 19:42

Message:
Rejected. It's in the archives for reference, but for now, I don't think it's worth spending cycles worrying about this kind of stuff.  I'll eventually redesign the entire VM.

----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-10-27 11:08

Message:
Oops, the last patch update does not contain the f.f_lineno
computation in frame_getattr. This is necessary, cf. the
following messages:
http://www.python.org/pipermail/python-dev/2000-July/014395.html
http://www.python.org/pipermail/python-dev/2000-July/014401.html

Patch assigned to Guido, for review or further assignment.

----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-10-26 00:42

Message:
noreply@sourceforge.net wrote:
>
> Date: 2000-Oct-25 13:56
> By: gvanrossum
> 
> Comment:
> Vladimir, are you there?

So-so :) I'm a moving target, checking my mail occasionally these days.
Luckily, today is one of these days.

> 
> The patch doesn't apply cleanly to the current CVS tree any more...

Ah, this one's easy. Here's an update relative to 2.0 final, not CVS.
I got some r/w access error trying to update my CVS copy from SF
that I have no time to investigate right now... The Web interface still
works though :)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2000-10-25 20:56

Message:
Vladimir, are you there?

The patch doesn't apply cleanly to the current CVS tree any more...

----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-08-03 19:22

Message:
Fix missing DECREF on error condition in start_tracing()
+ some renaming.

----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-07-31 17:50

Message:
A last tiny fix of the SET_LINENO opcode for better b/w compatibility.
Stopping here and entering standby mode for reactions & feedback.

PS: the last idea about not duplicating co_code and tweaking the
original with CALL_TRACE is a bad one. I remember Guido being
against it because co_code could be used elsewhere (copied, written
to disk, whatever) and he's right! Better operate on an internal copy
created in ceval.

----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-07-31 14:57

Message:
Another rewrite, making this whole strategy b/w compatible according to
the 1st incompatibility point a) described in: http://www.python.org/pipermail/python-dev/2000-July/014364.html

Changes:

1. f.f_lineno is computed and updated on f_lineno attribute requests for
   f, given f.f_lasti. Correctness is ensured because f.f_lasti is updated
   on *all* attribute accesses (in LOAD_ATTR in the main loop).

2. The standard setup does not generate SET_LINENO, but uses
   co_lnotab for computing the source line number (e.g. tracebacks)
   This is equivalent to the actual "python -O".

3. With "python -d", we fall back to the current version of the
   interpreter (with SET_LINENO) thus making it easy to test whether
   this patch fully replaces SET_LINENO's behavior.
   (modulo f->f_lineno accesses from legacy C code, but this is insane).

IMO, this version already worths the pain to be truly tested and improved.

One improvement is to define a nicer public C API for breakpoints:
 - PyCode_SetBreakPointAtLine(line)
 - PyCode_SetBreakPointAtAddr(addr)
or similar, which would install a CALL_TRACE opcode in the appropriate
location of the copy of co_code.

Another idea is to avoid duplicating the entire co_code just for storing
the CALL_TRACE opcodes. We can store them in the original and keep
a table of breakpoints. Setting the breakpoints would occur whenever
the sys.settrace hook is set.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2000-07-31 13:40

Message:
Status set to postponed to indicate that this is still experimental.

----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-07-31 01:16

Message:
A nit: inline the argfetch in CALL_TRACE and goto the switch,
instead of jumping to get_oparg which splits the sequence
[fetch opcode, fetch oparg] -- this can slow things down.


----------------------------------------------------------------------

Comment By: Vladimir Marangozov (marangoz)
Date: 2000-07-30 23:12

Message:
For testing, as discussed on python-dev. For a gentle summary, see:
http://www.python.org/pipermail/python-dev/2000-July/014364.html


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=401022&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 21:41:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 13:41:08 -0800
Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes
Message-ID: <E16j5dM-0002Ww-00@usw-sf-web1.sourceforge.net>

Patches item #525109, was opened at 2002-03-03 11:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470

Category: IDLE
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Liebmann (mliebmann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extension to Calltips / Show attributes

Initial Comment:
The attached files (unified diff files) implement a 
(quick and dirty but usefull) extension to IDLE 0.8 
(Python 2.2)

- Tested on WINDOWS 95/98/NT/2000 -

Similar to "CallTips" this extension shows (context 
sensitive) all available member functions and 
attributes of the current object after hitting 
the 'dot'-key.

The toplevel help widget now supports scrolling. (Key-
Up and Key-Down events)

...that is why I changed among else the first argument 
of 'showtip' from 'text string' to a 'list of text 
strings' ...

The 'space'-key is used to insert the topmost item of 
the help widget into an IDLE text window.

...the even handling seems to be a critical part of 
the current IDLE implementation. That is why I added 
the new functionallity as a patch of CallTips.py and 
CallTipWindow.py. May be you still have a better 
implementation ...

Greetings
Martin Liebmann

----------------------------------------------------------------------

>Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-07 21:41

Message:
Logged In: YES 
user_id=475133

Patched and more robust version of the extended files 
CallTips.py and CallTipWindows.py. (Now more compatible to 
earlier versions of python)


----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-03 22:02

Message:
Logged In: YES 
user_id=475133

'<Key-.>' must be substituted by '.' within CallTip.py !
( Linux do not support an event named <Key-.> )

Running idle on Linux, I found the warning, that 'import *' 
is not allowed within function '_dir_main' of CallTip.py ???
Nevertheless CallTips works fine on Linux

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 22:28:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 14:28:08 -0800
Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL
Message-ID: <E16j6Mq-0003D8-00@usw-sf-web1.sourceforge.net>

Patches item #524327, was opened at 2002-03-01 14:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Tino Lange (tinolange)
Assigned to: Piers Lauder (pierslauder)
Summary: imaplib.py and SSL

Initial Comment:
Hallo!

Our company has decided to allow only SSL connections 
to the e-mailbox from outside. So I needed a SSL 
capable "imaplib.py" to run my mailwatcher-scripts 
from home.

Thanks to the socket.ssl() in recent Pythons it was 
nearly no problem to derive an IMAP4_SSL-class from 
the existing IMAP4-class in Python's standard library.

Maybe you want to look over the very small additions 
that were necessary to implement the IMAP-over-SSL-
functionality and add it as a part of the next 
official "imaplib.py"?

Here's the context diff from the most recent CVS 
version (1.43). It works fine for me this way and it's 
only a few straight-forward lines of code.

Maybe I could contribute a bit to the Python project 
with this patch?

Best regards

Tino Lange


----------------------------------------------------------------------

>Comment By: Tino Lange (tinolange)
Date: 2002-03-07 23:28

Message:
Logged In: YES 
user_id=212920

Hi Piers!

Here we are ... diffs attached.
Best regards

Tino


----------------------------------------------------------------------

Comment By: Piers Lauder (pierslauder)
Date: 2002-03-04 23:55

Message:
Logged In: YES 
user_id=196212

Ok, (the boring bit :-) please provide a matching patch for
the documentation (in dist/src/Doc/lib/libimaplib.tex), and
I'll install both patches. Thanks Tino!


----------------------------------------------------------------------

Comment By: Tino Lange (tinolange)
Date: 2002-03-04 11:55

Message:
Logged In: YES 
user_id=212920

Hallo!

socket.ssl() -Objects only have _two_ methods
read()
write()

I don't know how they handle write() internally - whether
they use a send() or a sendall() equivalent for the
underlying socket call. I didn't look in the C sources for
that.

That's also why I had to code the readline() by hand in the
while-loop, because socket.ssl() - Objects only have read(),
no readline().

But the implementation works quite fine (by the way also
under Windows after replacing the _socket.pyd with an SSL
enabled one). 

Best regards

Tino


----------------------------------------------------------------------

Comment By: Piers Lauder (pierslauder)
Date: 2002-03-04 06:47

Message:
Logged In: YES 
user_id=196212

This seems fine to me, but i can't test it as i don't have
access to an ssl-enabled imapd. My only caveat is - do
socket.ssl objects have a "sendall" method? - in which case
that is what should be used in the send method.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470


From noreply@sourceforge.net  Thu Mar  7 23:09:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 15:09:58 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16j71K-0005AL-00@usw-sf-web1.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 02:37:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 18:37:51 -0800
Subject: [Patches] [ python-Patches-502080 ] BaseHTTPServer send_error bug fix
Message-ID: <E16jAGV-0006of-00@usw-sf-web2.sourceforge.net>

Patches item #502080, was opened at 2002-01-10 16:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502080&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Jonathan Gardner (jgardn)
Assigned to: Skip Montanaro (montanaro)
Summary: BaseHTTPServer send_error bug fix

Initial Comment:
BaseHTTPServer's send_error function didn't send 
"Content-Type: text/html". While this was okay for 
Mozilla 0.9.7, Konqueror 2.2.2 rendered it as plain 
text.

I added one line to send the Content-Type and 
everything works great. A BETTER solution would be to 
figure out what kind of document the error message 
is, but that is left as an exercise for a beefier 
HTTP server, which is not what BaseHTTPServer is 
intended to be.


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-07 20:37

Message:
Logged In: YES 
user_id=44345

I could hardly let the experiment fail!

Checked in as BaseHTTPServer.py v 1.18


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 19:37

Message:
Logged In: YES 
user_id=6380

Looks good to me.  

As an experiment, assigning to Skip, who can check it in or
pass it on.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502080&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 04:44:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 07 Mar 2002 20:44:39 -0800
Subject: [Patches] [ python-Patches-512799 ] webchecker protocol bug
Message-ID: <E16jCFD-000182-00@usw-sf-web1.sourceforge.net>

Patches item #512799, was opened at 2002-02-04 10:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512799&group_id=5470

Category: Demos and tools
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: seb bacon (sebbacon)
>Assigned to: A.M. Kuchling (akuchling)
Summary: webchecker protocol bug

Initial Comment:
Tools/webchecker.py checks protocol of URLs and ignores
redundant ones like mailto.  However, urllib.splittype
returns a tuple where the code expects a string, so it
doesn't work.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512799&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 10:22:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 02:22:06 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16jHVm-0002N6-00@usw-sf-web3.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 16:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 10:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 18:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 17:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 11:09:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 03:09:16 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16jIFQ-0002s4-00@usw-sf-web3.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 13:14:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 05:14:04 -0800
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E16jKCC-0000hK-00@usw-sf-web1.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 04:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 13:20:51 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 05:20:51 -0800
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E16jKIl-0000y5-00@usw-sf-web1.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 04:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

>Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 04:20

Message:
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 13:29:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 05:29:11 -0800
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E16jKQp-0004Tv-00@usw-sf-web3.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 08:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-08 08:29

Message:
Logged In: YES 
user_id=33168

Confirmed that the test w/o fix fails
and the test passes with the fix to _sre.c.

But I'm not sure if the memset can go too far:

  memset(state->mark + lastmark + 1, 0, 
         (state->lastmark - lastmark) * sizeof(void*));

I can try under purify, but that doesn't guarantee anything.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 08:20

Message:
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 14:44:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 06:44:11 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16jLbP-000514-00@usw-sf-web1.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 11:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 09:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 06:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 05:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 05:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 13:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 12:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 15:15:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 07:15:56 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16jM68-0006NA-00@usw-sf-web2.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 13:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
>Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 15:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 23:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 01:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 16:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 10:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 10:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 03:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 17:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 11:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 11:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 12:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 20:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 17:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 13:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 08:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 19:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 18:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 16:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 16:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 14:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 15:23:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 07:23:33 -0800
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E16jMDV-00072W-00@usw-sf-web1.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 04:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

>Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 06:23

Message:
Logged In: YES 
user_id=86307

I'm pretty sure the memset is correct; state->lastmark is 
the index of last mark written to (not the index of the 
next potential write).

Also, it occurred to me that there is another related error 
here:

>>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123')
>>> m.groups()
(None, None)
>>> m.lastindex
2

In other words, lastindex claims that group 2 was the last 
that matched, even though it didn't really match.  Since 
lastindex is undocumented, this probably doesn't matter too 
much.  Still, it probably should be reset if it is pointing 
to a group which gets "unmatched" when state->lastmark is 
reduced.  Perhaps a function like the following should be 
added for use in the three places where state->lastmark is 
reset to a previous value:

void lastmark_restore(SRE_STATE *state, int lastmark)
{
    assert(lastmark >= 0);
    if (state->lastmark > lastmark) {
        int lastvalidindex = 
            (lastmark == 0) ? -1 : (lastmark-1)/2+1;
        if (state->lastindex > lastvalidindex)
            state->lastindex = lastvalidindex;
        memset(
            state->mark + lastmark + 1, 0,
            (state->lastmark - lastmark) * sizeof(void*)
        );
    }
    state->lastmark = lastmark;
}
 

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-08 04:29

Message:
Logged In: YES 
user_id=33168

Confirmed that the test w/o fix fails
and the test passes with the fix to _sre.c.

But I'm not sure if the memset can go too far:

  memset(state->mark + lastmark + 1, 0, 
         (state->lastmark - lastmark) * sizeof(void*));

I can try under purify, but that doesn't guarantee anything.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 04:20

Message:
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 15:39:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 07:39:48 -0800
Subject: [Patches] [ python-Patches-527427 ] minidom fails to use NodeList sometimes
Message-ID: <E16jMTE-0005sZ-00@usw-sf-web3.sourceforge.net>

Patches item #527427, was opened at 2002-03-08 12:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Eduardo Barros (cesarb)
Assigned to: Nobody/Anonymous (nobody)
Summary: minidom fails to use NodeList sometimes

Initial Comment:
(why is the summary box so small?)

xml.dom.minidom doesn't use a NodeList as the return
type of GetElementsByTagName{,NS} as it should. The
patch (against 2.2 or HEAD) fixes it.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 17:31:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 09:31:11 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16jOD0-0007mE-00@usw-sf-web2.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 18:31

Message:
Logged In: YES 
user_id=89016

What should replace do: Return u"?" or (end-start)*u"?"

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 16:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Fri Mar  8 18:28:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 08 Mar 2002 10:28:06 -0800
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E16jP66-0008Mc-00@usw-sf-web2.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 08:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
>Assigned to: Fredrik Lundh (effbot)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-08 13:28

Message:
Logged In: YES 
user_id=31435

Assigned to /F -- he's the expert here.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 10:23

Message:
Logged In: YES 
user_id=86307

I'm pretty sure the memset is correct; state->lastmark is 
the index of last mark written to (not the index of the 
next potential write).

Also, it occurred to me that there is another related error 
here:

>>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123')
>>> m.groups()
(None, None)
>>> m.lastindex
2

In other words, lastindex claims that group 2 was the last 
that matched, even though it didn't really match.  Since 
lastindex is undocumented, this probably doesn't matter too 
much.  Still, it probably should be reset if it is pointing 
to a group which gets "unmatched" when state->lastmark is 
reduced.  Perhaps a function like the following should be 
added for use in the three places where state->lastmark is 
reset to a previous value:

void lastmark_restore(SRE_STATE *state, int lastmark)
{
    assert(lastmark >= 0);
    if (state->lastmark > lastmark) {
        int lastvalidindex = 
            (lastmark == 0) ? -1 : (lastmark-1)/2+1;
        if (state->lastindex > lastvalidindex)
            state->lastindex = lastvalidindex;
        memset(
            state->mark + lastmark + 1, 0,
            (state->lastmark - lastmark) * sizeof(void*)
        );
    }
    state->lastmark = lastmark;
}
 

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-08 08:29

Message:
Logged In: YES 
user_id=33168

Confirmed that the test w/o fix fails
and the test passes with the fix to _sre.c.

But I'm not sure if the memset can go too far:

  memset(state->mark + lastmark + 1, 0, 
         (state->lastmark - lastmark) * sizeof(void*));

I can try under purify, but that doesn't guarantee anything.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 08:20

Message:
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 10:08:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 02:08:15 -0800
Subject: [Patches] [ python-Patches-500136 ] Update ext build documentation
Message-ID: <E16jdlv-00010S-00@usw-sf-web2.sourceforge.net>

Patches item #500136, was opened at 2002-01-06 14:27
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500136&group_id=5470

Category: None
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Martin v. Löwis (loewis)
Summary: Update ext build documentation

Initial Comment:
The attached file documents how extensions are build
using distutils. It is intended to replace atleast 
unix.tex, and possible also windows.tex.

Fred, if this is ok, I would like to check it in as
ext/building.tex, and remove ext/unix.tex. I would then
add a comment on top of windows.tex that the build
procedure using distutils should work out of the box on
Windows as well.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 11:08

Message:
Logged In: YES 
user_id=21627

Committed as building.tex 1.1, ext.tex 1.105, windows.tex
1.4, deleting unix.tex.

I left windows.tex, since the technology described in this
section continues to work; I only added a note that
developers shouod consider distutils instead.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-03-08 04:36

Message:
Logged In: YES 
user_id=3066

Please check this & and close the bugs this fixes.
Thanks, and sorry for the delay in looking at this.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-01-23 09:34

Message:
Logged In: YES 
user_id=44345

This looks good.  I had written a short replacement file
called 
build.tex and was going to submit it this morning,
but yours looks 
better.  Presuming Distutils gets rid of
the need for Windows-specific 
build solutions, I agree both
unix.tex and windows.tex should be 
replaced.

One phrase didn't make sense to me.  Near the top it says

  
(known as related to Makefile.pre.in, and Setup files)

I don't know 
what that means.  I would just zap any
reference to the old build 
method.

Skip


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-23 08:59

Message:
Logged In: YES 
user_id=21627

This fixes bugs #497695, #500115, #506545, 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500136&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 10:47:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 02:47:35 -0800
Subject: [Patches] [ python-Patches-403972 ] threaded profiler.
Message-ID: <E16jeNz-0000RF-00@usw-sf-web3.sourceforge.net>

Patches item #403972, was opened at 2001-02-23 16:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403972&group_id=5470

Category: Demos and tools
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Amila Fernando (amila)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: threaded profiler.

Initial Comment:
Basically a profiler that can handle threaded programs and generate
profiling snapshots. It does however have some situations it cannot handle well . (see included README for details). 

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 11:47

Message:
Logged In: YES 
user_id=21627

I recommend to reject this patch. Since it is pure-Python,
it is probably more suited as a stand-alone package.

For inclusion into Python, trying to hook into thread
creation is a hack, IMO, there are certainly ways to cheat
that technique.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-07-04 06:27

Message:
Logged In: YES 
user_id=3066

Assigned to me since I've been digging into the profiling
support lately.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-05-09 18:11

Message:
Logged In: YES 
user_id=31392

Perhaps you could share this on comp.lang.python and see if
people can help you fix the situations it doesn't handle
well.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403972&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 10:56:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 02:56:55 -0800
Subject: [Patches] [ python-Patches-437733 ] Additional Argument to Python/C Function
Message-ID: <E16jeX1-0000Wv-00@usw-sf-web3.sourceforge.net>

Patches item #437733, was opened at 2001-07-01 18:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=437733&group_id=5470

Category: None
Group: None
>Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Additional Argument to Python/C Function

Initial Comment:
this patch makes it possible that a python/c function 
gets an aditional void* argument. this makes it easier 
to use python with c++. 

PS:i'm a bad despriptor,so please look at the diff 
file. 

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 11:56

Message:
Logged In: YES 
user_id=21627

Was that re-opened by mistake? Closing it again.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 23:03

Message:
Logged In: YES 
user_id=21627

It appears that the C++ fragment is broken and does not 
work as intended.

Apparently, PyCppFunction is a member pointer, which is 
intended to be passed through to the invocation of 
pycfunction. However, AFAICT, addmethod converts the 
pointer-to-member-function into a void* before passing it 
into the methoddefs. This C++ code has undefined 
behaviour: there is no guarantee that a pointer-to-member 
can fit into a void*. In fact, on g++, a pointer-to-member 
is larger than a void* (8 bytes on a 32-bit machine).

It may be possible to fix this. However, I think there are 
much more issues to integrating C++ classes into Python; 
such a class structure would add little if any value. 
Therefore, I'm in favour of rejecting this patch.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-07-25 23:54

Message:
Logged In: YES 
user_id=3066

Just for the record, I'm not ignoring your emailed plea for re-consideration; I just haven't had time to dig back into this matter.

Here's the sample class from the email, so it will be easier to keep track of and for others to comment on the approach:

class PyClass{
public:
	PyClass();
	typedef PyObject* (PyClass::*PyCppFunction)
(PyObject*);
	void addmethod(const char* name,PyCppFunction func);
	~PyClass();
	operator PyObject*(){return (PyObject*)obj;}
private:
	struct PyClassObject{
		PyObject_HEAD
		PyClass *self;
	};
	std::vector<PyMethodDef> methods;
	static PyObject *pycfunc(PyClassObject 
*self,PyObject *arg,void *p);
	static PyObject *getattr(PyClassObject *self,char 
*name);
	static void dealloc(PyClassObject *){}
	PyTypeObject typeobject;
	PyClassObject *obj;
};

void PyClass::addmethod(const char *name,PyCppFunction func)
{
	PyMethodDef meth={
		strdup(name),
		(PyCFunction)pycfunc,
		METH_VARARGS|METH_USERARG,
		NULL,
		*(void**)&func
	};
	methods.insert(methods.begin(),meth);
}

PyClass::~PyClass(){
	for(vector<PyMethodDef>::iterator i=methods.begin
();i!=methods.end();i++)
		free(i->ml_name);
	methods.resize(0);
}

PyObject *PyClass::pycfunc(PyClassObject *self,PyObject 
*arg,void *p){
	PyCppFunction func=*(PyCppFunction*)&p;
	return (self->self->*func)(arg);
}

PyClass::PyClass(){
	PyTypeObject Xxtype = {
		PyObject_HEAD_INIT(&PyType_Type)
		0,			/*ob_size*/
		"xx",			/*tp_name*/
		sizeof(PyClassObject),	/*tp_basicsize*/
		0,			/*tp_itemsize*/
		/* methods */
		(destructor)dealloc, /*tp_dealloc*/
		0,			/*tp_print*/
		(getattrfunc)getattr, /*tp_getattr*/
		(setattrfunc)0, /*tp_setattr*/
		0,			/*tp_compare*/
		0,			/*tp_repr*/
		0,			/*tp_as_number*/
		0,			/*tp_as_sequence*/
		0,			/*tp_as_mappi ng*/
		0,			/*tp_hash*/
	};
	typeobject=Xxtype;
	obj=PyObject_NEW(PyClassObject,&typeobject);
	obj->self=this;
	PyMethodDef md={0};
	methods.push_back(md);
}

PyObject *PyClass::getattr(PyClassObject *self,char *name){
	return Py_FindMethod(&self->self->methods[0],
(PyObject*)self,name);
}

This class is meant to be a base class for other classes 
that represent python types.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-07-04 06:48

Message:
Logged In: YES 
user_id=3066

The patch is easy enough to understand, but the motivation
for this is not at all clear.  Rejecting as code bloat.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=437733&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:02:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:02:18 -0800
Subject: [Patches] [ python-Patches-440407 ] Remote execution patch for IDLE
Message-ID: <E16jecE-0001YJ-00@usw-sf-web2.sourceforge.net>

Patches item #440407, was opened at 2001-07-11 15:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470

Category: IDLE
Group: None
Status: Open
>Resolution: Out of Date
Priority: 3
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Remote execution patch for IDLE

Initial Comment:
This is the code I have for the remote execution
patch.  (Remote execution must be enabled with an
explicit command line argument -r.)

Caveats:

- undocumented
- slow
- security issue: the subprocess should not be the
server but the client, to prevent a hacker from gaining
access

This should apply cleanly against IDLE as currently
checked into the Python CVS tree.

I don't want to check this in yet because of the
security issue, and I don't have time to work on it. I
hope the idlefork project will pick this up though and
address the issues above.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:02

Message:
Logged In: YES 
user_id=21627

It appears the patch is slightly outdated now, atleast the
chunk removing set_break does not apply anymore.

Has this been integrated to idlefork?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-07-11 15:38

Message:
Logged In: YES 
user_id=6380

Uploading the patch again.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:04:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:04:30 -0800
Subject: [Patches] [ python-Patches-443899 ] Minor fix to gzip.py module.
Message-ID: <E16jeeM-0005Ec-00@usw-sf-web1.sourceforge.net>

Patches item #443899, was opened at 2001-07-23 21:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Titus Brown (titus)
>Assigned to: Martin v. Löwis (loewis)
Summary: Minor fix to gzip.py module.

Initial Comment:
---
from cStringIO import StringIO
from gzip import GzipFile

stringFile = StringIO()

gzFile = GzipFile("test1", 'wb', 9, stringFile)

gzFile.write('howdy there!')
r = gzFile.read()
---

the above code fragment gave a nonintuitive error
response (attribute missing).  Now, an exception is
raised stating that the file is not opened for reading
or writing.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:04

Message:
Logged In: YES 
user_id=21627

Taken the load from Jeremy.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-10-19 13:03

Message:
Logged In: YES 
user_id=21627

I think gzip files should behave like fileobjects with
respect to exceptions. Perhaps inconsistently, performing
read or write on files that are opened only for the other
operation raises an IOError (EBADF), since Posix says so,
whereas performing close on a closed file raises a
ValueError (it can't perform a system call since the file
descriptor might have been recycled meanwhile).

So I'm still in favour of applying this patch, with the
valueerror changed to IOError, and perhaps passing EBADF as
the error code in all cases of IOError.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-19 04:09

Message:
Logged In: YES 
user_id=6380

Time to look at this again?

----------------------------------------------------------------------

Comment By: Titus Brown (titus)
Date: 2001-08-16 22:33

Message:
Logged In: YES 
user_id=23486

Re: context diff, thanks & sorry for the trouble; my newer
patches are being submitted this way.

Re: IOError, I wasn't sure which exception to use at the time.
I therefore took my cue from other code in the gzip module,
which raises a ValueError when self.fileobj is closed.

The only IO errors raised in the module are those that
pertain to incorrect file formats.  I'd be happy to change
any and all of the ValueErrors that are raised into
IOErrors, but I think the current consistency of errors
should be maintained ;).


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-16 20:42

Message:
Logged In: YES 
user_id=21627

Please always submit context (-c) or unified (-u) diffs; 
I've reformatted your patch by retrieving 1.24, applying 
the patch, updating to the current version, and 
regenerating the patch.

Apart from that, the patch looks fine to me, and I 
recommend to approve it.

One consideration is the exception being raised: Maybe 
IOError is more appropriate.


----------------------------------------------------------------------

Comment By: Titus Brown (titus)
Date: 2001-08-14 21:15

Message:
Logged In: YES 
user_id=23486

(sorry -- misunderstanding of how the changelog view works)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:24:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:24:05 -0800
Subject: [Patches] [ python-Patches-448038 ] a move function for shutil.py
Message-ID: <E16jexJ-0000nb-00@usw-sf-web3.sourceforge.net>

Patches item #448038, was opened at 2001-08-05 02:08
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=448038&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: William McVey (wamcvey)
Assigned to: Nobody/Anonymous (nobody)
Summary: a move function for shutil.py

Initial Comment:
Although shutil.py has some nice copy functions but no
real equivalent of mv(1).  This is a very simple
implimentation (as in not a whole lot of stuff has been
implimented) but it's functional.  Simply calls rename,
and if that fails tries to copy and unlink.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:24

Message:
Logged In: YES 
user_id=21627

Here is an attempt to provide error handling for copytree.
It collects all exceptions in a list, and raises them as
shutil.Error.

This would be inconsistent with shutil.rmtree, which offers
the choice of ignore errors,invoke an error callback, or
raise an exception at the problem.

Which of these alternatives would you like to see implemented?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-08-08 17:41

Message:
Logged In: YES 
user_id=6380

This is OK, but only perpetuates the problem with this
module -- it doesn't have a decent error handling strategy
(prints to stdout!?!?!?!).

If someone wants to put some more effort in this, I would
recommend at least adding an option to copytree() to control
error handling.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=448038&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:30:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:30:36 -0800
Subject: [Patches] [ python-Patches-450583 ] Extend/embed tools for AIX
Message-ID: <E16jf3c-0001oS-00@usw-sf-web2.sourceforge.net>

Patches item #450583, was opened at 2001-08-13 21:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450583&group_id=5470

Category: Demos and tools
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extend/embed tools for AIX

Initial Comment:
The support tools for extending and embedding with AIX 
are installed into ${LIBPL}, but "configure" still 
creates a pointer in the Makefile as if they were 
installed into ${LIBP} instead. This patch 
corrects "configure"s behavior to match the install 
behavior in Makefile.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:30

Message:
Logged In: YES 
user_id=21627

What version was this originally against? It appears that
this is a duplicate of #103679, applied as configure.in 1.201.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-14 16:04

Message:
Logged In: YES 
user_id=21627

Attached patch as requested in

Xns90FCA43515C62beablebeable@30.146.28.98

Your comments do show up; just don't use the "Back" button
of your browser without reloading the page. Also, you may
consider getting an account so that you don't appear
anonymous here.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-14 13:39

Message:
Logged In: NO 

Hmmm ... it seems that my followup comments are not being 
displayed; are they being retained somewhere for moderating?

For the third time, the patch is:
*** configure.in.orig	Mon Aug 13 15:45:14 2001
--- configure.in	Mon Aug 13 15:55:33 2001
***************
*** 571,577 ****
  	case $ac_sys_system/$ac_sys_release in
  	AIX*)
  		BLDSHARED="\/Modules/ld_so_aix 
\ -bI:Modules/python.exp"
! 		LDSHARED="\/ld_so_aix \ -
bI:\/python.exp"
  		;;
  	BeOS*)
  		BLDSHARED="\/Modules/ld_so_beos 
$LDLIBRARY"
--- 571,577 ----
  	case $ac_sys_system/$ac_sys_release in
  	AIX*)
  		BLDSHARED="\/Modules/ld_so_aix 
\ -bI:Modules/python.exp"
! 		LDSHARED="\/config/ld_so_aix 
\ -bI:\/config/python.exp"
  		;;
  	BeOS*)
  		BLDSHARED="\/Modules/ld_so_beos 
$LDLIBRARY"


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-14 13:35

Message:
Logged In: NO 

The attachment won't seem to attach ... perhaps there 
wasn't quite enough testing of the patch manager for poor 
schmucks like myself who are trapped behind corporate 
firewalls.

The patch contents:
*** configure.in.orig	Mon Aug 13 15:45:14 2001
--- configure.in	Mon Aug 13 15:55:33 2001
***************
*** 571,577 ****
  	case $ac_sys_system/$ac_sys_release in
  	AIX*)
  		BLDSHARED="\/Modules/ld_so_aix 
\ -bI:Modules/python.exp"
! 		LDSHARED="\/ld_so_aix \ -
bI:\/python.exp"
  		;;
  	BeOS*)
  		BLDSHARED="\/Modules/ld_so_beos 
$LDLIBRARY"
--- 571,577 ----
  	case $ac_sys_system/$ac_sys_release in
  	AIX*)
  		BLDSHARED="\/Modules/ld_so_aix 
\ -bI:Modules/python.exp"
! 		LDSHARED="\/config/ld_so_aix 
\ -bI:\/config/python.exp"
  		;;
  	BeOS*)
  		BLDSHARED="\/Modules/ld_so_beos 
$LDLIBRARY"


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-13 23:48

Message:
Logged In: YES 
user_id=21627

Which patch?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450583&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:40:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:40:20 -0800
Subject: [Patches] [ python-Patches-452232 ] timestamp function for time module
Message-ID: <E16jfD2-0000xG-00@usw-sf-web3.sourceforge.net>

Patches item #452232, was opened at 2001-08-17 23:37
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gareth Harris (garethharris)
Assigned to: Nobody/Anonymous (nobody)
Summary: timestamp function for time module

Initial Comment:
Timestamp creates timestamp strings 
in ISO or ODBC format in UTC or local timezones. 
It can also add microseconds where needed. 
Timestamps are often needed 
outside database or XML activities, 
so its proposed location is the time module.

timestamp(secs=None,fmt='ISO',TZ=None,fracsec=None):
    '''Make ISO or ODBC timestamp from [current] time.
    Parameters:
    secs= float seconds, else default = time()
    fmt = 'ISO' use ISO 8601 standard format = 
            "YYYY-MM-DDTHH:MM:SS.mmmmmmZ"       Zulu or
            "YYYY-MM-DDTHH:MM:SS.mmmmmm-hh:mm"  local
      else  "YYYY-MM-DD HH:MM:SS.mmmmmm"        ODBC
    TZ      = None=GMT/UTC/Zulu, else local time zone
    fracsec = None, else add microseconds to string
    '''

Any improvement or standardization is welcome.

Gareth Harris
gharris@nrao.edu
2001-08-17T21:36:00Z


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:40

Message:
Logged In: YES 
user_id=21627

If you want to see the code included, you'd need to provide
a  context diff, including docs and test cases.

However, notice that there may be overlap with the emerging
builtin DateTime type, see

http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage

----------------------------------------------------------------------

Comment By: Gareth Harris (garethharris)
Date: 2002-01-02 17:41

Message:
Logged In: YES 
user_id=300900

Back from travel, other projects etc. [2001.01.02]
Thanks for comments thus far.
Maybe I will finally meet some of you in Feb.
---
I proposed to put this in TIME module 
UNLESS someone has an idea for a better location.
Who takes care of that module?
Shall I provide: doc?, test suite?
Is a companion decode function needed?
OTHERWISE I will put it in sourceforge/activestate?
Which is preferred?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-01 21:27

Message:
Logged In: YES 
user_id=21627

Gareth,

Can you please propose a strategy to advance this patch or
withdraw it? If there is no action, I propose to close it by
Feb 1, 2002.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-12-06 15:57

Message:
Logged In: YES 
user_id=3066

Another possible alternate home for this would be the Python
Snippet repository on SourceForge:

http://sourceforge.net/snippet/browse.php?by=lang&lang=6

I'm not suggesting that this doesn't belong in the standard
library, however.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 19:46

Message:
Logged In: YES 
user_id=21627

Nice patch. If you want to see this included, you should 
complete it: Decide on location of the function, provide 
documentation and test cases. As the location, it may be 
that the calendar module could provide a home, but you may 
ask in the newsgroup.

If you merely wanted to publish this code snippet, I 
suggest that you find a better home than the Python patch 
database, e.g. the Cookbook:

http://aspn.activestate.com/ASPN/Cookbook/Python

There are a number of other places that collect Python 
snippets; this is just one option.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:44:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:44:09 -0800
Subject: [Patches] [ python-Patches-462754 ] no '_d' ending for mingw32
Message-ID: <E16jfGj-0001xV-00@usw-sf-web2.sourceforge.net>

Patches item #462754, was opened at 2001-09-19 05:29
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Nobody/Anonymous (nobody)
Summary: no '_d' ending for mingw32

Initial Comment:
This patch prevents distutils from naming the extension
modules <extname>_d.pyd when compiled with mingw32 on
Windows in debug mode. Instead, the extension modules
will get the normal name <extname>.pyd. Technically,
the patch doesn't prevent the behaviour for mingw32,
but only adds the _d for MS Visual C++ and Borland
compilers (though I don't know about the Borland case).

The reason for this? Adding "_d" doesn't make any sense
for GNU compilers. I think it's just a MS Visual C++
madness. If you want to debug an extension module that
was compiled with gcc, you have to use gdb anyway,
because the debugging symbols of MSVC++ and gcc are
incompatible. So you normally use a release Python
version (from the python.org binary download) and
compile your extensions with mingw32.

To put it shortly:

The current state is that you do a
"setup.py build --compiler=mingw32 --debug" and then
rename the extension modules, removing the _d. Then
fire up gdb to debug your module.

With this patch, the renaming isn't necessary anymore.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:44

Message:
Logged In: YES 
user_id=21627

Does the patch actually work? It seems to me that, if
compiled with-pydebug, import will automatically search for
the _d version, and complain if it is not found.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-04 12:52

Message:
Logged In: YES 
user_id=21627

The rationale for using the debugging version of MSVCRT are
not the debugging information alone, but also the additional
functionalities, like heap consistency checks and other
assertions. So it is not obvious that you do not want to use
the debugging version of this library in a debug build.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2002-01-04 03:50

Message:
Logged In: YES 
user_id=163326

mingw links with msvcrt.dll. I've plans to add mingw32
support to the autoconf build process (hopefully soon enough
for 2.3).

The GNU and MS debugger symbols are incompatible, though, so
I think that mingw32 shouldn't link to the debug version of
msrcrt (gdb doesn't understand the Microsoft debugger
symbols; and the Visual Studio debugger has no idea what the
debugging symbols of gcc are all about; isn't cross-platform
and cross-compiler programming fun?).


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-30 14:13

Message:
Logged In: YES 
user_id=21627

How does the mingw port interact with the debugging
libraries? With MSVC, the debug build will link to the debug
versions of the CRT. What C library will mingw link with (I
hope it won't use crtdll.dll)?

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-28 23:28

Message:
Logged In: YES 
user_id=163326

Yes. But mingw32 isn't emulating Unix under Windows (that
would be Cygwin). It's just a version of gcc and friends
that targets native win32. It links against msvcrt (not a
Posix emulation library like Cygwin does).

This is a bit hypothetical because I didn't yet hack the
autoconf build process for native win32 with mingw32.

Currently, you cannot build a complete Python with mingw32,
but you *can* build extension modules against an existing
Python (compiled with M$ VC++).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-09-28 22:43

Message:
Logged In: YES 
user_id=31435

All else being equal, a system emulating Unix under Windows 
should strive to make life comfortable for Unix folks.  The 
question is thus whether all else is in fact equal <wink -- 
but I don't know, as I don't yet use the system under 
discussion>.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-28 20:37

Message:
Logged In: YES 
user_id=163326

Hmm. I don't like the _d endings at all. But if the policy
on win32 is that debug executables and libraries get a "_d"
ending, then I'm unsure wether this patch should be applied.

I have plans to hack the autoconf madness to build a native
win32 Python with mingw32. But that won't be ready by
tomorror. And I don't think that I'll add "_d" endings there
for debugging, because that would be inconsistent with the
normal autoconf builds on Unix.

I'm glad that *I* don't have to decide wether this patch is
a Good Thing. Being consistent with Python win32 build or
with GNU (gcc/autoconf). Take your pick :-)


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-09-19 05:46

Message:
Logged In: YES 
user_id=31435

FYI, MSVC never adds _d on its own -- Mark Hammond and/or 
Guido forced it to do that.  I don't remember why, but one 
of them explained it to me long ago and it made good sense 
at the time <wink>.

MSCV normally compiles debug and release builds into 
distinct subdirectories, and uses the same names in both.  
But *our* MSVC setup forces it to compile both flavors of 
build directly into the PCbuild directory, so has to give 
the resulting DLLs and executables different names (else 
the second build would overwrite the results of the first 
build).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 11:54:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 03:54:57 -0800
Subject: [Patches] [ python-Patches-472523 ] Reminder: 2.3 should check tp_compare
Message-ID: <E16jfRB-00024I-00@usw-sf-web2.sourceforge.net>

Patches item #472523, was opened at 2001-10-18 21:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=472523&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
>Priority: 6
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Nobody/Anonymous (nobody)
Summary: Reminder: 2.3 should check tp_compare

Initial Comment:
In 2.3, the outcome of tp_compare should be required to
be -1, 0 or 1; other values should be considered
*illegal*.

(In 2.2, the docs were changed to stress this but for
backwards compatibility this isn't enforced.)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:54

Message:
Logged In: YES 
user_id=21627

Attached is a patch that implements this test, producing a
warning if tp_compare does not follow that restriction.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:54

Message:
Logged In: YES 
user_id=21627

Attached is a patch that implements this test, producing a
warning if tp_compare does not follow that restriction.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=472523&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 12:00:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 04:00:16 -0800
Subject: [Patches] [ python-Patches-491936 ] Opt for tok_nextc
Message-ID: <E16jfWK-00019U-00@usw-sf-web3.sourceforge.net>

Patches item #491936, was opened at 2001-12-12 09:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: David Jacobs (dbj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Opt for tok_nextc

Initial Comment:
tokenizer.c - revision 2.53

I tried to pick a routine that looked like it was 
heavily used and optimizations that do not increase 
the maintenance burden (I wont feel bad if you reject 
it though, I'll keep on trying as long as you don't 
consider it a burden :-).

I changed one strcpy to a memcpy because the length 
had already been computed.

I also changed the pattern:
a = strchr(b,'\0');
to
a = b + strlen(b);

Which is an idiom I've seen in many other places in 
the code so I don't think it makes it harder to 
understand and strlen is significantly more efficient 
than strchr.

Aloha,
David Jacobs (your pico optimizer :-)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 13:00

Message:
Logged In: YES 
user_id=21627

Can you report some data about the resulting speedup? I
seriously doubt that this is a significant change; unless
data is forthcoming proving me wrong, I recommend to reject
this patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 12:04:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 04:04:45 -0800
Subject: [Patches] [ python-Patches-494047 ] removes 64-bit ?: to cope on plan9
Message-ID: <E16jfaf-0001HQ-00@usw-sf-web3.sourceforge.net>

Patches item #494047, was opened at 2001-12-17 03:18
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494047&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Russ Cox (rsc)
Assigned to: Nobody/Anonymous (nobody)
Summary: removes 64-bit ?: to cope on plan9

Initial Comment:
The Plan 9 C compiler can't handle 64-bit numbers
as the branches of a ternary operation.  Rewrite 
a ? b : c into if (a) then b else c in two places.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 13:04

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, committed as longobject.c 1.115.

I have not integrated it into 2.2.1, since I believe it is
unlikely that all other plan9 changes are that trivial, so
there is little chance that 2.2.1 will work out of the box
on that system.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-17 03:55

Message:
Logged In: YES 
user_id=6380

Thanks. We'll do this in 2.2.1 or 2.3, since (IMO) it's too
close to the release date of 2.2.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494047&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 12:11:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 04:11:23 -0800
Subject: [Patches] [ python-Patches-504224 ] add plan9 threads include to thread.c
Message-ID: <E16jfh5-0001LR-00@usw-sf-web3.sourceforge.net>

Patches item #504224, was opened at 2002-01-16 07:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504224&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Russ Cox (rsc)
Assigned to: Nobody/Anonymous (nobody)
Summary: add plan9 threads include to thread.c

Initial Comment:
Adds the usual #ifdef and #include.
I still haven't submitted any of the Plan 9 specific 
files (e.g., thread-plan9.h) since they're still in 
flux.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 13:11

Message:
Logged In: YES 
user_id=21627

Thanks, applied as thread.c 2.41.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504224&group_id=5470


From noreply@sourceforge.net  Sat Mar  9 14:19:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 06:19:46 -0800
Subject: [Patches] [ python-Patches-440407 ] Remote execution patch for IDLE
Message-ID: <E16jhhK-0007Gx-00@usw-sf-web1.sourceforge.net>

Patches item #440407, was opened at 2001-07-11 09:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470

Category: IDLE
Group: None
Status: Open
Resolution: Out of Date
Priority: 3
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Remote execution patch for IDLE

Initial Comment:
This is the code I have for the remote execution
patch.  (Remote execution must be enabled with an
explicit command line argument -r.)

Caveats:

- undocumented
- slow
- security issue: the subprocess should not be the
server but the client, to prevent a hacker from gaining
access

This should apply cleanly against IDLE as currently
checked into the Python CVS tree.

I don't want to check this in yet because of the
security issue, and I don't have time to work on it. I
hope the idlefork project will pick this up though and
address the issues above.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-09 09:19

Message:
Logged In: YES 
user_id=6380

No, the IDLEfork project has stalled except for tweaking the
configuration code (which would be good to merge into the
Python IDLE  tree when it's ready).  I expect the patch
failure is shallow so I won't bother fixing it.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 06:02

Message:
Logged In: YES 
user_id=21627

It appears the patch is slightly outdated now, atleast the
chunk removing set_break does not apply anymore.

Has this been integrated to idlefork?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-07-11 09:38

Message:
Logged In: YES 
user_id=6380

Uploading the patch again.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 05:31:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 21:31:41 -0800
Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen()
Message-ID: <E16jvvp-0004eu-00@usw-sf-web2.sourceforge.net>

Patches item #523415, was opened at 2002-02-28 01:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Gimblett (gimbo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Explict proxies for urllib.urlopen()

Initial Comment:
This patch extends urllib.urlopen() so that
proxies may be specified explicitly.  This is
achieved by adding an optional "proxies"
parameter.  If this parameter is omitted,
urlopen() acts exactly as before, ie gets
proxy settings from the environment.

This is useful if you want to tell urlopen()
not to use the proxy: just pass an empty
dictionary.

Also included is a patch to the urllib
documentation explaining the new parameter.

Apologies if patch format is not exactly as
required: this is my first submission.  All
feedback appreciated.  :-)


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-10 16:31

Message:
Logged In: YES 
user_id=250749

I think expanding the docs is the go here.

In looking at the 2.2 docs (11.4 urllib), the bits that I think could usefully be improved include:-
- the paragraph describing the proxy environment variables should note that on Windows,
  browser (at least for InternetExplorer - I don't know about Netscape) registry settings for proxies
  will be used when available;
- a short para noting that proxies can be overridden using URLopener/FancyURLopener 
  class instances, documented further down the page, placed just before the note about 
  not supporting authenticating proxies;
- adding a description of the "proxies" parameter to the URLopener class definition;
- adding an example of bypassing proxies to the examples subsection (11.4.2).

If/when you upload a doc patch, I suggest that you assign it to Fred Drake, who is the 
chief docs person.

----------------------------------------------------------------------

Comment By: Andy Gimblett (gimbo)
Date: 2002-03-04 20:33

Message:
Logged In: YES 
user_id=262849

Thanks for feedback re: diffs.  Have now found out
about context diffs and attached new version - hope
this is better.

Regarding the patch itself, this arose out of a newbie
question on c.l.py and I was reminded that this was an
issue I'd come across in my early days too.  Personally
I'd never picked up the hint that you should use
FancyURLopener directly.

If preferred, I could have a go at patching the docs
to make that clearer?


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:34

Message:
Logged In: YES 
user_id=250749

BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 14:32

Message:
Logged In: YES 
user_id=250749

Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the 
docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) 
directly.  If this is the intent, the docs could be a little clearer about this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 05:45:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 09 Mar 2002 21:45:44 -0800
Subject: [Patches] [ python-Patches-528022 ] PEP 285 - Adding a bool type
Message-ID: <E16jw9Q-0003nH-00@usw-sf-web3.sourceforge.net>

Patches item #528022, was opened at 2002-03-10 00:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528022&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 285 - Adding a bool type

Initial Comment:
Here's a preliminary implementation of the PEP,
including unittests  checking the promises made in the
PEP (test_bool.py) and (some) documentation.

With this 12 tests fail for me (on Linux); I'll look
into these later.  They appear shallow (mostly doctests
dying on True or False where 1 or 0 was expected).

Note: the presence of this patch does not mean that the
PEP is accepted -- it just means that a sample
implementation exists in case someone wants to explore
the effects of the PEP on their code.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528022&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 08:00:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 00:00:42 -0800
Subject: [Patches] [ python-Patches-499062 ] Minor typo in test_generators.py
Message-ID: <E16jyG2-00062H-00@usw-sf-web2.sourceforge.net>

Patches item #499062, was opened at 2002-01-03 13:32
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499062&group_id=5470

Category: Tests
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 3
Submitted By: Uche Ogbuji (uche)
Assigned to: Tim Peters (tim_one)
Summary: Minor typo in test_generators.py

Initial Comment:
This one caused me a bit of confusion.  Traditionally
"leaves" refer to tree nodes with no children.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-10 03:00

Message:
Logged In: YES 
user_id=31435

Changed, in
dist/src/Lib/test/test_generators.py; new revision: 1.31
nondist/peps/pep-0255.txt; new revision: 1.18

----------------------------------------------------------------------

Comment By: Uche Ogbuji (uche)
Date: 2002-01-03 23:06

Message:
Logged In: YES 
user_id=38966

No more argument.  s/leaves/labels it is.  Thanks.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-01-03 19:35

Message:
Logged In: YES 
user_id=31435

Yes, I think "leaf" == "no kids" is universally accepted.

I don't like changing it to plain "nodes", though, because 
the example code does not generate the nodes, it generates 
only the node labels -- someone confused by the misuse 
of "leaves" here is also likely to be confused by the 
misuse of "nodes" -- and I'm going to reduce the priority 
of this patch every time you argue back <wink>.

----------------------------------------------------------------------

Comment By: Uche Ogbuji (uche)
Date: 2002-01-03 19:23

Message:
Logged In: YES 
user_id=38966

It's s/leaves/nodes/.  Maybe I've been working with DOM too
much.  At any rate, I have always thought of leaf nodes as
only those with no children.

It doesn't look as if anything from my patch made it
through: neither the comment nor the patch.  Sometimes I
hate SF.

I'll try again, though it hardly seems necessary...


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-01-03 19:18

Message:
Logged In: YES 
user_id=31435

Assigned to me; added a "Tests" category and recategorized 
accordingly.

Uche, if you tried to upload a patch, it didn't work (did 
you remember to check the upload box)?  What is that you 
want to see changed?  s/leaves/labels/?  Note that the 
example in the docstring test is lifted directly out of PEP 
255, so tell me what would shut you up <wink> and I'll make 
the change in both places.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499062&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 12:46:38 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 04:46:38 -0800
Subject: [Patches] [ python-Patches-528038 ] __nonzero__ being improperly called
Message-ID: <E16k2ik-0005vP-00@usw-sf-web1.sourceforge.net>

Patches item #528038, was opened at 2002-03-10 09:16
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Burton Radons (loth)
Assigned to: Nobody/Anonymous (nobody)
Summary: __nonzero__ being improperly called

Initial Comment:
As noted in Bug #527816, if you call the __nonzero__
method of a builtin type directly it will SIGSEGV you.
 The reason is that internally the nonzero slot is
being called with "PyObject *(*) (PyObject *)" casting,
rather than the actual "int (*) (PyObject *)".  This
small patch adds a new static function that's just a
copy of wrap_hashfunc and gets it called properly later on.

If this isn't how we want bugfixes handled, please
advise and I'll revise.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-10 13:46

Message:
Logged In: YES 
user_id=21627

The patch looks good. However, wouldn't it be simpler to use
wrap_inquiry instead? (esp. since nb_nonzero is defined as
inquiryfunc).

Also, a test case (perhaps inside test_descr) which
currently crashes but succeeds under your patch would be
appreciated.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 14:28:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 06:28:07 -0800
Subject: [Patches] [ python-Patches-528038 ] __nonzero__ being improperly called
Message-ID: <E16k4Ix-0003UM-00@usw-sf-web2.sourceforge.net>

Patches item #528038, was opened at 2002-03-10 03:16
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Burton Radons (loth)
Assigned to: Guido van Rossum (gvanrossum)
Summary: __nonzero__ being improperly called

Initial Comment:
As noted in Bug #527816, if you call the __nonzero__
method of a builtin type directly it will SIGSEGV you.
 The reason is that internally the nonzero slot is
being called with "PyObject *(*) (PyObject *)" casting,
rather than the actual "int (*) (PyObject *)".  This
small patch adds a new static function that's just a
copy of wrap_hashfunc and gets it called properly later on.

If this isn't how we want bugfixes handled, please
advise and I'll revise.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-10 09:11

Message:
Logged In: YES 
user_id=6380

Thanks! Fixed in CVS, using Martin's approach.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-10 07:46

Message:
Logged In: YES 
user_id=21627

The patch looks good. However, wouldn't it be simpler to use
wrap_inquiry instead? (esp. since nb_nonzero is defined as
inquiryfunc).

Also, a test case (perhaps inside test_descr) which
currently crashes but succeeds under your patch would be
appreciated.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 18:46:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 10:46:59 -0800
Subject: [Patches] [ python-Patches-500311 ] Work around for buggy https servers
Message-ID: <E16k8LT-00032Y-00@usw-sf-web3.sourceforge.net>

Patches item #500311, was opened at 2002-01-07 09:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500311&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michel Van den Bergh (vdbergh)
>Assigned to: Martin v. Löwis (loewis)
Summary: Work around for buggy https servers

Initial Comment:
Python 2.2. Tested on RH 7.1.

This a workaround for, 

http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=494762

The problem is that some https servers close an ssl
connection without properly resetting it first. In the
above bug description it is suggested that this
only occurs for IIS but apparently some  (modified)
Apache servers also suffer from it (see
telemeter.telenet.be).

One of the suggested workarounds is to modify
httplib.py so as to ignore the combination of
err[0]==SSL_ERROR_SYSCALL and 
err[1]=="EOF occurred in violation of protocol".
However I think one should never compare error strings
since in principle they may depend on language etc...

So I decided to modify _socket.c slightly so that
it becomes possible to return error codes which
are not in in ssl.h.

When an ssl-connection is closed without reset I now
return the error code SSL_ERROR_EOF. Then I ignore
this (apparently benign) error in httplib.py.

In addition I fixed what I think was an error in
PySSL_SetError(SSL *ssl, int ret) in socketmodule.c.

Originally there was:

	case SSL_ERROR_SSL:
	{
		unsigned long e = ERR_get_error();
		if (e == 0) {
			/* an EOF was observed that violates the protocol */
			errstr = "EOF occurred in violation of protocol";

etc... 
but if I understand the documentation for
SSL_get_error then the test should be: e==0 && ret==0.
A similar error occurs a few lines later.

----------------------------------------------------------------------

Comment By: Michel Van den Bergh (vdbergh)
Date: 2002-01-09 11:25

Message:
Logged In: YES 
user_id=10252

Due to some problems with sourceforge and incompetence on my
part I submitted this several times.
Please see patch 500311. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500311&group_id=5470


From noreply@sourceforge.net  Sun Mar 10 22:14:40 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 14:14:40 -0800
Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior
Message-ID: <E16kBaS-0008W2-00@usw-sf-web2.sourceforge.net>

Patches item #514662, was opened at 2002-02-07 23:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Naofumi Honda (naofumi-h)
Assigned to: Guido van Rossum (gvanrossum)
Summary: On the update_slot() behavior

Initial Comment:
Inherited method __getitem__ of list type
in the new subclass is unexpectedly slow. 

For example,

x = list([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m2.390s 

class nlist(list):
        pass

x = nlist([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m7.040s
about 3times slower!!!

The reason is:
for the __getitem__ attribute, there are
two slotdefs in typeobject.c
(one for the mapping type, and
the other for the sequence type).

In the creation of new_type of list type, 
fixup_slot_dispatchers() and update_slot() functions
in typeobject.c allocate the functions
to both sq_item and mp_subscript slots
(the mp_subscript slot had originally no function,
  because the list type is a sequence type),
 and it's an unexpected allocation for the mapping
 slot since the descriptor type of __getitem__
 is now WrapperType for the sequence operations.

If you will trace x[1] using gdb,
you will find that in PyObject_GetItem() 
m->mp_subscript = slot_mp_subscript 
is called instead of a sequece operation
because mp_subscript slot was allocated by
fixup_slot_dispatchers().
In the slot_mp_subscirpt(),
call_method(self, "__getitem__", ...) is invoked,
and turn out to call a wrapper descriptors for
the sq_item.

As a result, the method of list type finally called,
but it needs many unexpected function calls.

I will fix the behavior of fixup_slot_dispachers()
and update_slot() as follows:

Only the case where 
*) two or more slotdefs have the same attribute
   name where at most one corresponding slot
   has a non null pointer
*) the descriptor type of the attribute is
   WrapperType,

these functions will allocate the only one
function to the apropriate slot.

The other case, the behavior not changed
to keep compatiblity!
(in particular, considering the case where
  user override methods exist!)

The following patch also includes speed up routines
to find the slotdef duplications,
but it's not essential!


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-10 17:14

Message:
Logged In: YES 
user_id=6380

Thanks for the analysis! Would you mind submitting a new
patch without the #ifdef ORIGINAL_CODE stuff? Just
delete/replace old code as needed -- cvs diff will show me
the original code. The ORIGINAL_CODE stuff makes it harder
for me to get the point of the diff. Also, maybe you could
leave the speedup code out, to show the absolutely minimal
amount of code needed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470


From noreply@sourceforge.net  Mon Mar 11 00:25:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 16:25:28 -0800
Subject: [Patches] [ python-Patches-498109 ] fileobject truncate support for win32
Message-ID: <E16kDd2-0001Qt-00@usw-sf-web2.sourceforge.net>

Patches item #498109, was opened at 2001-12-31 11:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=498109&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Wolfgang Strobl (strobl)
Assigned to: Tim Peters (tim_one)
Summary: fileobject truncate support for win32

Initial Comment:
python 2.2 has large file support on Windows, but 
f.truncate() throws an overflow exception when f.tell()
>2G. 

I've changed file_truncate in fileobject.c to using 
SetEndOfFile iff truncate is called without a 
parameter, on Win32.

Tested on W2k (Ger), see the diff to test_largfile.py.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-10 19:25

Message:
Logged In: YES 
user_id=31435

I'm rejecting the patch (because it does too little), but 
implemented the suggested solution instead and checked it 
in (so you should be happy it's rejected <wink>):

Doc/lib/libstdtypes.tex; new revision: 1.82
Lib/test/test_largefile.py; new revision: 1.13
Misc/NEWS; new revision: 1.361
Objects/fileobject.c; new revision: 2.145

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-10 03:25

Message:
Logged In: YES 
user_id=31435

I wonder why you're settling for so little here, and I'm 
not sure it's a real improvement to fix one special large 
file case while letting others continue to blow up, and 
especially not when leaving it all undocumented.

Did you consider using SetFilePointer() before SetEndOfFile
(), in order to handle all cases (the former allows setting 
to 64-bit file positions)?  This is trickier 
(e.g., .truncate() should never *grow* the file, and it 
would get you into the obscure Windows LARGE_INTEGER 
business), but would be much more satisfying.

In any case, note that you should

#define WINDOWS_LEAN_AND_MEAN

before including windows.h in Python.

----------------------------------------------------------------------

Comment By: Wolfgang Strobl (strobl)
Date: 2002-01-02 02:49

Message:
Logged In: YES 
user_id=311771

Right. See the attached diff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-01 14:15

Message:
Logged In: YES 
user_id=21627

Wouldn't it be better to include a header file instead of
declaring a SetEndOfFile prototype?

----------------------------------------------------------------------

Comment By: Wolfgang Strobl (strobl)
Date: 2001-12-31 14:56

Message:
Logged In: YES 
user_id=311771

Oops. While removing some obsolete personal notes, I 
accidentally removed the leading comment.   

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-31 11:42

Message:
Logged In: YES 
user_id=6380

For Tim.

I presume the chunk of the diff that removes the leading
comment of the file is a mistake?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=498109&group_id=5470


From noreply@sourceforge.net  Mon Mar 11 06:47:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 10 Mar 2002 22:47:27 -0800
Subject: [Patches] [ python-Patches-443899 ] Minor fix to gzip.py module.
Message-ID: <E16kJah-0002ma-00@usw-sf-web3.sourceforge.net>

Patches item #443899, was opened at 2001-07-23 21:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Titus Brown (titus)
Assigned to: Martin v. Löwis (loewis)
Summary: Minor fix to gzip.py module.

Initial Comment:
---
from cStringIO import StringIO
from gzip import GzipFile

stringFile = StringIO()

gzFile = GzipFile("test1", 'wb', 9, stringFile)

gzFile.write('howdy there!')
r = gzFile.read()
---

the above code fragment gave a nonintuitive error
response (attribute missing).  Now, an exception is
raised stating that the file is not opened for reading
or writing.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-11 07:47

Message:
Logged In: YES 
user_id=21627

Thanks again for the patch; committed (in modified form) as
gzip.py 1.29.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:04

Message:
Logged In: YES 
user_id=21627

Taken the load from Jeremy.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-10-19 13:03

Message:
Logged In: YES 
user_id=21627

I think gzip files should behave like fileobjects with
respect to exceptions. Perhaps inconsistently, performing
read or write on files that are opened only for the other
operation raises an IOError (EBADF), since Posix says so,
whereas performing close on a closed file raises a
ValueError (it can't perform a system call since the file
descriptor might have been recycled meanwhile).

So I'm still in favour of applying this patch, with the
valueerror changed to IOError, and perhaps passing EBADF as
the error code in all cases of IOError.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-19 04:09

Message:
Logged In: YES 
user_id=6380

Time to look at this again?

----------------------------------------------------------------------

Comment By: Titus Brown (titus)
Date: 2001-08-16 22:33

Message:
Logged In: YES 
user_id=23486

Re: context diff, thanks & sorry for the trouble; my newer
patches are being submitted this way.

Re: IOError, I wasn't sure which exception to use at the time.
I therefore took my cue from other code in the gzip module,
which raises a ValueError when self.fileobj is closed.

The only IO errors raised in the module are those that
pertain to incorrect file formats.  I'd be happy to change
any and all of the ValueErrors that are raised into
IOErrors, but I think the current consistency of errors
should be maintained ;).


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-16 20:42

Message:
Logged In: YES 
user_id=21627

Please always submit context (-c) or unified (-u) diffs; 
I've reformatted your patch by retrieving 1.24, applying 
the patch, updating to the current version, and 
regenerating the patch.

Apart from that, the patch looks fine to me, and I 
recommend to approve it.

One consideration is the exception being raised: Maybe 
IOError is more appropriate.


----------------------------------------------------------------------

Comment By: Titus Brown (titus)
Date: 2001-08-14 21:15

Message:
Logged In: YES 
user_id=23486

(sorry -- misunderstanding of how the changelog view works)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470


From noreply@sourceforge.net  Mon Mar 11 17:00:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 11 Mar 2002 09:00:44 -0800
Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes
Message-ID: <E16kTAC-0002kJ-00@usw-sf-web2.sourceforge.net>

Patches item #525109, was opened at 2002-03-03 06:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470

Category: IDLE
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Liebmann (mliebmann)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: Extension to Calltips / Show attributes

Initial Comment:
The attached files (unified diff files) implement a 
(quick and dirty but usefull) extension to IDLE 0.8 
(Python 2.2)

- Tested on WINDOWS 95/98/NT/2000 -

Similar to "CallTips" this extension shows (context 
sensitive) all available member functions and 
attributes of the current object after hitting 
the 'dot'-key.

The toplevel help widget now supports scrolling. (Key-
Up and Key-Down events)

...that is why I changed among else the first argument 
of 'showtip' from 'text string' to a 'list of text 
strings' ...

The 'space'-key is used to insert the topmost item of 
the help widget into an IDLE text window.

...the even handling seems to be a critical part of 
the current IDLE implementation. That is why I added 
the new functionallity as a patch of CallTips.py and 
CallTipWindow.py. May be you still have a better 
implementation ...

Greetings
Martin Liebmann

----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-07 16:41

Message:
Logged In: YES 
user_id=475133

Patched and more robust version of the extended files 
CallTips.py and CallTipWindows.py. (Now more compatible to 
earlier versions of python)


----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-03 17:02

Message:
Logged In: YES 
user_id=475133

'<Key-.>' must be substituted by '.' within CallTip.py !
( Linux do not support an event named <Key-.> )

Running idle on Linux, I found the warning, that 'import *' 
is not allowed within function '_dir_main' of CallTip.py ???
Nevertheless CallTips works fine on Linux

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470


From noreply@sourceforge.net  Tue Mar 12 02:49:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 11 Mar 2002 18:49:36 -0800
Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior
Message-ID: <E16kcM4-0005tw-00@usw-sf-web3.sourceforge.net>

Patches item #514662, was opened at 2002-02-08 04:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Naofumi Honda (naofumi-h)
Assigned to: Guido van Rossum (gvanrossum)
Summary: On the update_slot() behavior

Initial Comment:
Inherited method __getitem__ of list type
in the new subclass is unexpectedly slow. 

For example,

x = list([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m2.390s 

class nlist(list):
        pass

x = nlist([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m7.040s
about 3times slower!!!

The reason is:
for the __getitem__ attribute, there are
two slotdefs in typeobject.c
(one for the mapping type, and
the other for the sequence type).

In the creation of new_type of list type, 
fixup_slot_dispatchers() and update_slot() functions
in typeobject.c allocate the functions
to both sq_item and mp_subscript slots
(the mp_subscript slot had originally no function,
  because the list type is a sequence type),
 and it's an unexpected allocation for the mapping
 slot since the descriptor type of __getitem__
 is now WrapperType for the sequence operations.

If you will trace x[1] using gdb,
you will find that in PyObject_GetItem() 
m->mp_subscript = slot_mp_subscript 
is called instead of a sequece operation
because mp_subscript slot was allocated by
fixup_slot_dispatchers().
In the slot_mp_subscirpt(),
call_method(self, "__getitem__", ...) is invoked,
and turn out to call a wrapper descriptors for
the sq_item.

As a result, the method of list type finally called,
but it needs many unexpected function calls.

I will fix the behavior of fixup_slot_dispachers()
and update_slot() as follows:

Only the case where 
*) two or more slotdefs have the same attribute
   name where at most one corresponding slot
   has a non null pointer
*) the descriptor type of the attribute is
   WrapperType,

these functions will allocate the only one
function to the apropriate slot.

The other case, the behavior not changed
to keep compatiblity!
(in particular, considering the case where
  user override methods exist!)

The following patch also includes speed up routines
to find the slotdef duplications,
but it's not essential!


----------------------------------------------------------------------

>Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-12 02:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-10 22:14

Message:
Logged In: YES 
user_id=6380

Thanks for the analysis! Would you mind submitting a new
patch without the #ifdef ORIGINAL_CODE stuff? Just
delete/replace old code as needed -- cvs diff will show me
the original code. The ORIGINAL_CODE stuff makes it harder
for me to get the point of the diff. Also, maybe you could
leave the speedup code out, to show the absolutely minimal
amount of code needed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470


From noreply@sourceforge.net  Tue Mar 12 02:49:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 11 Mar 2002 18:49:55 -0800
Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior
Message-ID: <E16kcMN-0005u9-00@usw-sf-web3.sourceforge.net>

Patches item #514662, was opened at 2002-02-08 04:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Naofumi Honda (naofumi-h)
Assigned to: Guido van Rossum (gvanrossum)
Summary: On the update_slot() behavior

Initial Comment:
Inherited method __getitem__ of list type
in the new subclass is unexpectedly slow. 

For example,

x = list([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m2.390s 

class nlist(list):
        pass

x = nlist([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m7.040s
about 3times slower!!!

The reason is:
for the __getitem__ attribute, there are
two slotdefs in typeobject.c
(one for the mapping type, and
the other for the sequence type).

In the creation of new_type of list type, 
fixup_slot_dispatchers() and update_slot() functions
in typeobject.c allocate the functions
to both sq_item and mp_subscript slots
(the mp_subscript slot had originally no function,
  because the list type is a sequence type),
 and it's an unexpected allocation for the mapping
 slot since the descriptor type of __getitem__
 is now WrapperType for the sequence operations.

If you will trace x[1] using gdb,
you will find that in PyObject_GetItem() 
m->mp_subscript = slot_mp_subscript 
is called instead of a sequece operation
because mp_subscript slot was allocated by
fixup_slot_dispatchers().
In the slot_mp_subscirpt(),
call_method(self, "__getitem__", ...) is invoked,
and turn out to call a wrapper descriptors for
the sq_item.

As a result, the method of list type finally called,
but it needs many unexpected function calls.

I will fix the behavior of fixup_slot_dispachers()
and update_slot() as follows:

Only the case where 
*) two or more slotdefs have the same attribute
   name where at most one corresponding slot
   has a non null pointer
*) the descriptor type of the attribute is
   WrapperType,

these functions will allocate the only one
function to the apropriate slot.

The other case, the behavior not changed
to keep compatiblity!
(in particular, considering the case where
  user override methods exist!)

The following patch also includes speed up routines
to find the slotdef duplications,
but it's not essential!


----------------------------------------------------------------------

>Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-12 02:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-12 02:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-10 22:14

Message:
Logged In: YES 
user_id=6380

Thanks for the analysis! Would you mind submitting a new
patch without the #ifdef ORIGINAL_CODE stuff? Just
delete/replace old code as needed -- cvs diff will show me
the original code. The ORIGINAL_CODE stuff makes it harder
for me to get the point of the diff. Also, maybe you could
leave the speedup code out, to show the absolutely minimal
amount of code needed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470


From noreply@sourceforge.net  Tue Mar 12 21:40:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 12 Mar 2002 13:40:22 -0800
Subject: [Patches] [ python-Patches-523271 ] Docstrings for os.stat and time.localtim
Message-ID: <E16ku0M-0003Js-00@usw-sf-web2.sourceforge.net>

Patches item #523271, was opened at 2002-02-27 00:32
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523271&group_id=5470

Category: Documentation
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Sean Reifschneider (jafo)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Docstrings for os.stat and time.localtim

Initial Comment:
This patch updates the first line of the docstrings for
os.stat(), os.lstat(), and time.*time() so that it
reflects the attribute names on the tuple-like struct
object returned.

It changes:

localtime(...)
    localtime([seconds]) ->
(year,month,day,hour,minute,second,weekday,dayofyear
,dst)

into:

gmtime([seconds]) ->
(tm_year,tm_mon,tm_day,tm_hour,tm_min,tm_sec,tm_wday,tm_yday,tm_isdst)


----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-03-12 16:40

Message:
Logged In: YES 
user_id=3066

Checked in a modified patch as Modules/posixmodule.c
revisions 2.216.4.3 and 2.225, and Modules/timemodule.c
revisions 2.118.6.2 and 2.124.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523271&group_id=5470


From noreply@sourceforge.net  Wed Mar 13 03:16:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 12 Mar 2002 19:16:42 -0800
Subject: [Patches] [ python-Patches-515015 ] inspect.py raise exception if code not found
Message-ID: <E16kzFq-0006vt-00@usw-sf-web2.sourceforge.net>

Patches item #515015, was opened at 2002-02-08 17:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: inspect.py raise exception if code not found

Initial Comment:
there is a comment which says the suffixes should
be sorted by length, but there is no comparison
function.

this patch adds a comparison (lambda).

also, there are two functions which are documented
to return IOError if there are problems, but
if the function reaches the end, there were
no raises.  This patch adds raise IOErrors.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-12 22:16

Message:
Logged In: YES 
user_id=33168

Checked in as inspect.py 1.27.
Only findsource() is documented to raise an IOError,
so that is the only function that is fixed.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:36

Message:
Logged In: YES 
user_id=6380

Neal, can you check this is and mark as bugfix?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-02-09 09:16

Message:
Logged In: YES 
user_id=33168

Sorry, I saw the map/lambda above, but misread the code.
Attached is a new file (just contains the 2 raises).

I really need to add a test for this as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-08 18:10

Message:
Logged In: YES 
user_id=31435

Please remove the lambda trick from the patch.  The comment 
is explaining why the negation of the length is the first 
element of the tuples being sorted (that's what guarantees 
the longest suffix is checked first in case of overlap).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470


From noreply@sourceforge.net  Wed Mar 13 12:15:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Mar 2002 04:15:01 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16l7en-000284-00@usw-sf-web3.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 23:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Wed Mar 13 19:54:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Mar 2002 11:54:18 -0800
Subject: [Patches] [ python-Patches-529586 ] Missing character in BNF
Message-ID: <E16lEpG-0007D2-00@usw-sf-web3.sourceforge.net>

Patches item #529586, was opened at 2002-03-13 19:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470

Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeremy Yallop (yallop)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Missing character in BNF

Initial Comment:
The bitwise inversion operator isn't displayed in the
python grammar (reference manual, section 5.5).

Tidle needs to be escaped in the LaTeX source.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470


From noreply@sourceforge.net  Wed Mar 13 21:51:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Mar 2002 13:51:31 -0800
Subject: [Patches] [ python-Patches-529586 ] Missing character in BNF
Message-ID: <E16lGeh-0006m7-00@usw-sf-web1.sourceforge.net>

Patches item #529586, was opened at 2002-03-13 19:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470

Category: Documentation
Group: None
Status: Open
Resolution: None
>Priority: 2
Submitted By: Jeremy Yallop (yallop)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Missing character in BNF

Initial Comment:
The bitwise inversion operator isn't displayed in the
python grammar (reference manual, section 5.5).

Tidle needs to be escaped in the LaTeX source.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470


From noreply@sourceforge.net  Wed Mar 13 22:44:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Mar 2002 14:44:23 -0800
Subject: [Patches] [ python-Patches-476814 ] foreign-platform newline support
Message-ID: <E16lHTr-0000iU-00@usw-sf-web3.sourceforge.net>

Patches item #476814, was opened at 2001-10-31 17:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jack Jansen (jackjansen)
Assigned to: Jack Jansen (jackjansen)
Summary: foreign-platform newline support

Initial Comment:
This patch enables Python to interpret all known
newline conventions,
CR, LF or CRLF, on all platforms.

This support is enabled by configuring with
--with-universal-newlines
(so by default it is off, and everything should behave
as usual).

With universal newline support enabled two things
happen:
- When importing or otherwise parsing .py files any
newline convention
  is accepted.
- Python code can pass a new "t" mode parameter to
open() which
  reads files with any newline convention. "t" cannot
be combined with
  any other mode flags like "w" or "+", for obvious
reasons.

File objects have a new attribute "newlines" which
contains the type of
newlines encountered in the file (or None when no
newline has been seen,
or "mixed" if there were various types of newlines).

Also included is a test script which tests both file
I/O and parsing.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-03-13 23:44

Message:
Logged In: YES 
user_id=45365

A new version of the patch. Main differences are that U is now the mode character to trigger universal newline input and --with-universal-newlines is default on.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-16 23:47

Message:
Logged In: YES 
user_id=45365

This version of the patch addresses the bug in Py_UniversalNewlineFread and fixes up some minor details. Tim's other issues are addressed (at least: I think they are:-) in a forthcoming PEP.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-14 00:57

Message:
Logged In: YES 
user_id=31435

Back to Jack -- and sorry for sitting on it so long.  
Clearly this isn't making it into 2.2 in the core.  As I 
said on Python-Dev, I believe this needs a PEP:  the design 
decisions are debatable, so *should* be debated outside the 
Mac community too.  Note, though, that I can't stop you 
from adding it to the 2.2 Mac distribution (if you want it 
badly enough there).

If a PEP won't be written, I suggest finding someone else 
to review it again; maybe Guido.  Note that the patch needs 
doc changes too.  The patch to regrtest.py doesn't belong 
here (I assume it just slipped in).  There seems a lot of 
code in support of the f_newlinetypes member, and the value 
of that member isn't clear -- I can't imagine a good use 
for it (maybe it's a Mac thing?).  The implementation of 
Py_UniversalNewlineFread appears incorrect to me:  it reads 
n bytes *every* time around the outer loop, no matter how 
few characters are still required, and n doesn't change 
inside the loop.  The business about the GIL may be due to 
the lack of docs:  are, or are not, people supposed to 
release the GIL themselves around calls to these guys?  
It's not documented, and it appears your intent differed 
from my guess.  Finally, it would be better to call ferror
() after calling fread() instead of before it <wink>.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2001-11-14 16:13

Message:
Logged In: YES 
user_id=45365

Here's a new version of the patch. To address your issues
one by one:
- get_line and Py_UniversalNewlineFgets are too difficult to
integrate, at leat,
I don't see how I could do it. The storage management of
get_line gets in the way.

- The global lock comment I don't understand. The
Universal... routines are
replacements for fgets() and fread(), so have nothing to do
with the interpreter lock.

- The logic of all three routines (get_line too) has changed
and I've put comments in.
I hope this addresses some of the points.

- If universal_newline is false for a certain PyFileObject
we now immedeately take
a quick exit via fgets() or fread().

There's also a new test script, that tests some more border
cases (like lines longer
than 100 characters, and a lone CR just before end of file).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-11-05 09:16

Message:
Logged In: YES 
user_id=31435

It would be better if get_line just called 
Py_UniversalNewlineFgets (when appropriate) instead of 
duplicating its logic inline.

Py_UniversalNewlineFgets and Py_UniversalNewlineFread 
should deal with releasing the global lock themselves -- 
the correct granularity for lock release/reacquire is 
around the C-level input routines (esp. for fread).

The new routines never check for I/O errors!  Why not?  It 
seems essential.

The new Fgets checks for EOF at the end of the loop instead 
of the top.  This is surprising, and I stared a long time 
in vain trying to guess why.  Setting

newlinetypes |= NEWLINE_CR;

immediately after seeing an '\r' would be as fast (instead 
of waiting to see EOF and then inferring the prior 
existence of '\r' indirectly from the state of the 
skipnextlf flag).

Speaking of which <wink>, the fobj tests in the inner loop 
waste cycles.  Set the local flag vrbls whether or not fobj 
is NULL.  When you're *out* of the inner loop you can 
simply decline to store the new masks when fobj is NULL 
(and you're already doing the latter anyway).  A test and 
branch inside the loop is much more expensive than or'ing 
in a flag bit inside the loop, ditto harder to understand.

Floating the univ_newline test out of the loop (and 
duplicating the loop body, one way for univ_newline true 
and the other for it false) would also save a test and 
branch on every character.

Doing fread one character at a time is very inefficient.  
Since you know you need to obtain n characters in the end, 
and that these transformations require reading at least n 
characters, you could very profitably read n characters in 
one gulp at the start, then switch to k at a time where k 
is the number of \r\n pairs seen since the last fread 
call.  This is easier to code than it sounds <wink>.

It would be fine by me if you included (and initialized) 
the new file-object fields all the time, whether or not 
universal newlines are configured.  I'd rather waste a few 
bytes in a file object than see #ifdefs spread thru the 
code.

I'll be damned if I can think of a quick way to do this 
stuff on Windows -- native Windows fgets() is still the 
only Windows handle we have on avoiding crushing thread 
overhead inside MS's C library.  I'll think some more about 
it (the thrust still being to eliminate the 't' mode flag, 
as whined about <wink> on Python-Dev).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-31 18:38

Message:
Logged In: YES 
user_id=6380

Tim, can you review this or pass it on to someone else who
has time?

Jack developed this patch after a discussion in which I was
involved in some of the design, but I won't have time to
look at it until December.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470


From noreply@sourceforge.net  Thu Mar 14 07:13:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 13 Mar 2002 23:13:26 -0800
Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr
Message-ID: <E16lPQU-0008WS-00@usw-sf-web4.sourceforge.net>

Patches item #529768, was opened at 2002-03-14 08:13
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Speed-up getattr

Initial Comment:
This patch moves the string check in getattr before the
Unicode check, reducing the number of IsSubType checks
originating from getattr to 50% in a typical application.

For the attached artificial benchmark, this gives a 7%
speed-up.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470


From noreply@sourceforge.net  Thu Mar 14 14:46:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 06:46:25 -0800
Subject: [Patches] [ python-Patches-529586 ] Missing character in BNF
Message-ID: <E16lWUr-0000c7-00@usw-sf-web1.sourceforge.net>

Patches item #529586, was opened at 2002-03-13 14:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470

Category: Documentation
Group: None
>Status: Closed
>Resolution: Wont Fix
Priority: 2
Submitted By: Jeremy Yallop (yallop)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Missing character in BNF

Initial Comment:
The bitwise inversion operator isn't displayed in the
python grammar (reference manual, section 5.5).

Tidle needs to be escaped in the LaTeX source.


----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-03-14 09:46

Message:
Logged In: YES 
user_id=3066

This problem will no longer be present after bug #523117 is
fixed; that bug has been assigned a high priority.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470


From noreply@sourceforge.net  Thu Mar 14 16:24:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 08:24:30 -0800
Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups
Message-ID: <E16lY1m-0005k6-00@usw-sf-web2.sourceforge.net>

Patches item #502415, was opened at 2002-01-11 18:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Zooko O'Whielacronx (zooko)
Assigned to: Nobody/Anonymous (nobody)
Summary: optimize attribute lookups

Initial Comment:
This patch optimizes the string comparisons in
class_getattr(), class_setattr(), instance_getattr1(),
and instance_setattr().

I pulled out the relevant section of class_setattr()
and measured its performance, yielding the following
results:

 * in the case that the argument does *not* begin with
"__", then the new version is 1.03 times as fast as the
old.  (This is a mystery to me, as the path through the
code looks the same, in C.  I examined the assembly
that GCC v3.0.3 generated in -O3 mode, and it is true
that the assembly for the new version is
smaller/faster, although I don't really understand why.)

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "X_" (where X is a random alphabetic
character), then the new version 1.12 times as fast as
the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and does *not* end with "_", then the new version
1.16 times as fast as the old.

 * in the case that the argument is (randomly) one of
the six special names, then the new version is 2.7
times as fast as the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "__" (but is not one of the six
special names), then the new version is 3.7 times as
fast as the old.


----------------------------------------------------------------------

>Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-14 16:24

Message:
Logged In: YES 
user_id=52562

update:

I did a real app benchmark of this patch by running one of
the unit tests from 
PyXML-0.6.6.  (Which one?  The one that I guessed would
favor my optimization 
the most.  Unfortunately I've lost my notes and I don't
remember which one.)

I also separated out the "unroll strcmp" optimization from
the "use macros" 
optimization on request.

I have lost my notes, but I recall that my results showed
what I expected: 
between 0.5 and 3 percent app-level speed-up for the unroll
strcmp optimization.

Interesting detail: a quirk in GCC 3 makes the unroll strcmp
version is slightly 
faster than the current strcmp version *even* in the
(common) case that the 
first two characters of the attribute name are *not* '__'.

What should happen next:

1.  Someone who has the authority to approve or reject this
patch should tell me 
what kind of benchmark would be persuasive to you.  I mean:
what specific 
program I can run with and without my patch for a useful
comparison.  (If you 
require more than a 5% app-level speed-up, then let's give
up on this patch now!)

2.  Someone volunteer to test this patch with MSFT compiler,
as I don't have one 
right now.  Some people are still using the Windows
platform, I've noticed [1], 
so it is worth benchmarking.  Actually, someone should
volunteer to benchmark 
GCC+Linux-or-MacOSX, too, as my computer is a laptop with
variable-speed CPU and 
is really crummy for benchmarking.

By the way, PEP 266 is a better solution to the problem but
until it's 
implemented, this patch is the better patch.  ;-)

Note: this is one of those patches that looks uglier in
"diff -u" format than in 
actual source code.  Please browse the actual source
side-by-side [2] to see how 
ugly it really is.

Regards

Zooko

[1] http://www.google.com/press/zeitgeist/jan02-pie.gif
[2] search for "class_getattr" in:
    http://zooko.com/classobject.c
    http://zooko.com/classobject-strcmpunroll.c

---
                 zooko.com
Security and Distributed Systems Engineering
---


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-18 00:22

Message:
Logged In: YES 
user_id=52562

Okay I've done some "mini benchmarks".  The earlier reported
micro-benchmarks were the result of running the inner loop
itself, in C.  These mini benchmarks are the result of
running this Python script:

class A:
    def __init__(self):
        self.a = 0

a = A()
for i in xrange(2**20):
    a.a = i

print a.a

and then using different attribute names in place of `a'.
The results are as expected: the optimized version is faster
than the current one, depending on the shape of the
attribute name, and dampened by the fact that there is now
other work being done.  The case that shows the smallest
difference is when the attribute name neither begins nor
ends with an '_'.  In that case the above script runs about
2% faster with the optimizations.  The case that shows the
biggest difference is when the attribute begins and ends
with '__', as in `__a__'.  Then the above script runs about
15% faster.

This still isn't a *real* application benchmark.  I'm
looking for one that is a reasonable case for real Python
users but that also uses attribute lookups heavily.


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-17 20:33

Message:
Logged In: YES 
user_id=52562

Yeah, the optimized version is less readable that the original.

I'll try to come up with a benchmark application.  Any
ideas?  Maybe some unit tests from Zope that use attribute
lookups heavily?

My guess is that the actual results in an application will
be "marginal", like maybe between 0.5% to 3% improvement.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:29

Message:
Logged In: YES 
user_id=31392

This seems to add a lot of complexity for a few special
cases.  How important are these particular attributes?  Do
you have any benchmark applications that show real
improvement?  It seems like microbenchmarks overstate the
benefit, since we don't know how often these attributes are
looked up by most applications.

It would also be interesting to see how much of the benefit
for non __ names is the result of the PyString_AS_STRING()
macro.  Maybe that's all the change we really need :-).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470


From noreply@sourceforge.net  Thu Mar 14 23:05:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 15:05:47 -0800
Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py
Message-ID: <E16leI7-0006AH-00@usw-sf-web1.sourceforge.net>

Patches item #503202, was opened at 2002-01-14 00:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470

Category: Library (Lib)
>Group: Python 2.2.x
Status: Open
Resolution: None
>Priority: 7
Submitted By: Hye-Shik Chang (perky)
Assigned to: Barry Warsaw (bwarsaw)
Summary: backward compat. on calendar.py

Initial Comment:
Many applications fails on 2.2 by this problem:

under 2.1.1 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun
>>> calendar.month_abbr[7:]
['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


2.2 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 
Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri 
Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu 
Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed 
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue 
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 
Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 
Mon Tue Wed Thu Fri Sat Sun Mon Tue
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
ValueError: year out of range
>>> calendar.month_abbr[7:]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
TypeError: an integer is required
>>> 

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-14 18:05

Message:
Logged In: YES 
user_id=31435

Based on Guido's comment, categorized as 2.2.x and boosted 
priority to 7.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-14 01:18

Message:
Logged In: YES 
user_id=6380

You're right. Assigned to Barry.

I propose that the test suite should be changed to test for
this.

This would be a 2.2.1 bugfix candidate!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470


From noreply@sourceforge.net  Thu Mar 14 23:58:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 15:58:07 -0800
Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped
Message-ID: <E16lf6l-0006k7-00@usw-sf-web1.sourceforge.net>

Patches item #530105, was opened at 2002-03-14 23:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Nobody/Anonymous (nobody)
Summary: file object may not be subtyped

Initial Comment:
PyFileObject should be defined in fileobject.h, so it 
may be properly subtyped. This patches fixes this, 
and also a comment word typed twice.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 01:41:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 17:41:43 -0800
Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py
Message-ID: <E16lgj1-0003x5-00@usw-sf-web4.sourceforge.net>

Patches item #503202, was opened at 2002-01-13 23:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 7
Submitted By: Hye-Shik Chang (perky)
Assigned to: Barry Warsaw (bwarsaw)
Summary: backward compat. on calendar.py

Initial Comment:
Many applications fails on 2.2 by this problem:

under 2.1.1 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun
>>> calendar.month_abbr[7:]
['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


2.2 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 
Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri 
Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu 
Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed 
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue 
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 
Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 
Mon Tue Wed Thu Fri Sat Sun Mon Tue
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
ValueError: year out of range
>>> calendar.month_abbr[7:]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
TypeError: an integer is required
>>> 

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 19:41

Message:
Logged In: YES 
user_id=44345

Looks to me like adding

    if item > 6 or item < -7:
        raise IndexError

to the start of _localized_name.__getitem__ will do the 
trick.  (Should a test for non-integer items also be
added?)

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-14 17:05

Message:
Logged In: YES 
user_id=31435

Based on Guido's comment, categorized as 2.2.x and boosted 
priority to 7.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-14 00:18

Message:
Logged In: YES 
user_id=6380

You're right. Assigned to Barry.

I propose that the test suite should be changed to test for
this.

This would be a 2.2.1 bugfix candidate!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 01:45:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 17:45:23 -0800
Subject: [Patches] [ python-Patches-480902 ] allow dumbdbm to reuse space
Message-ID: <E16lgmZ-0003zY-00@usw-sf-web4.sourceforge.net>

Patches item #480902, was opened at 2001-11-12 07:30
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Skip Montanaro (montanaro)
Summary: allow dumbdbm to reuse space

Initial Comment:
This patch to dumbdbm does two things:
   * allows it to reuse holes in the .dat file
   * provides a somewhat more complete test

The first change should be considered only for 2.3. 
Barry may or may not want to check out the test case
rewrite for incorporation into 2.2.  Accordingly, I've
assigned it to him.

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 19:45

Message:
Logged In: YES 
user_id=44345

Unless someone else has an objection, I'm going to close
this.  Barry already incorporated the expanded test case
and the space reuse is not really that important in my
mind since dumbdbm is generally only a fallback when no
other database is available.  If someone wants to use
a database bad enough, they will probably figure out a
way to use something more powerful.

Skip


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2001-11-13 14:16

Message:
Logged In: YES 
user_id=12800

I've accepted the second half -- the improvement to the test
suite -- but as recommended, I'm postponing the first half
until Py 2.3.  Assigning back to Skip so he'll remember to
deal with this again later.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 03:08:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 19:08:17 -0800
Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py
Message-ID: <E16li4n-0004q3-00@usw-sf-web4.sourceforge.net>

Patches item #503202, was opened at 2002-01-14 00:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 7
Submitted By: Hye-Shik Chang (perky)
>Assigned to: Skip Montanaro (montanaro)
Summary: backward compat. on calendar.py

Initial Comment:
Many applications fails on 2.2 by this problem:

under 2.1.1 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun
>>> calendar.month_abbr[7:]
['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


2.2 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 
Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri 
Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu 
Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed 
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue 
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 
Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 
Mon Tue Wed Thu Fri Sat Sun Mon Tue
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
ValueError: year out of range
>>> calendar.month_abbr[7:]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
TypeError: an integer is required
>>> 

----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-14 22:08

Message:
Logged In: YES 
user_id=12800

Go for it Skip!

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 20:41

Message:
Logged In: YES 
user_id=44345

Looks to me like adding

    if item > 6 or item < -7:
        raise IndexError

to the start of _localized_name.__getitem__ will do the 
trick.  (Should a test for non-integer items also be
added?)

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-14 18:05

Message:
Logged In: YES 
user_id=31435

Based on Guido's comment, categorized as 2.2.x and boosted 
priority to 7.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-14 01:18

Message:
Logged In: YES 
user_id=6380

You're right. Assigned to Barry.

I propose that the test suite should be changed to test for
this.

This would be a 2.2.1 bugfix candidate!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 04:09:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 20:09:49 -0800
Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py
Message-ID: <E16lj2L-0004nN-00@usw-sf-web2.sourceforge.net>

Patches item #503202, was opened at 2002-01-13 23:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Fixed
Priority: 7
Submitted By: Hye-Shik Chang (perky)
Assigned to: Skip Montanaro (montanaro)
Summary: backward compat. on calendar.py

Initial Comment:
Many applications fails on 2.2 by this problem:

under 2.1.1 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun
>>> calendar.month_abbr[7:]
['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


2.2 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 
Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri 
Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu 
Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed 
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue 
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 
Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 
Mon Tue Wed Thu Fri Sat Sun Mon Tue
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
ValueError: year out of range
>>> calendar.month_abbr[7:]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
TypeError: an integer is required
>>> 

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 22:09

Message:
Logged In: YES 
user_id=44345

fixed by calendar.py 1.23 and test_calendar.py 1.2.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-14 21:08

Message:
Logged In: YES 
user_id=12800

Go for it Skip!

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 19:41

Message:
Logged In: YES 
user_id=44345

Looks to me like adding

    if item > 6 or item < -7:
        raise IndexError

to the start of _localized_name.__getitem__ will do the 
trick.  (Should a test for non-integer items also be
added?)

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-14 17:05

Message:
Logged In: YES 
user_id=31435

Based on Guido's comment, categorized as 2.2.x and boosted 
priority to 7.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-14 00:18

Message:
Logged In: YES 
user_id=6380

You're right. Assigned to Barry.

I propose that the test suite should be changed to test for
this.

This would be a 2.2.1 bugfix candidate!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 07:16:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 23:16:47 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16llxH-0004Yz-00@usw-sf-web3.sourceforge.net>

Patches item #517521, was opened at 2002-02-15 01:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 07:18:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 23:18:05 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16llyX-0002oQ-00@usw-sf-web1.sourceforge.net>

Patches item #517521, was opened at 2002-02-15 01:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 07:18:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 23:18:31 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16llyx-0007Pm-00@usw-sf-web4.sourceforge.net>

Patches item #517521, was opened at 2002-02-15 01:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 07:25:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 23:25:04 -0800
Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr
Message-ID: <E16lm5I-0007UX-00@usw-sf-web4.sourceforge.net>

Patches item #529768, was opened at 2002-03-14 18:13
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Speed-up getattr

Initial Comment:
This patch moves the string check in getattr before the
Unicode check, reducing the number of IsSubType checks
originating from getattr to 50% in a typical application.

For the attached artificial benchmark, this gives a 7%
speed-up.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:25

Message:
Logged In: YES 
user_id=250749

This seems to be pretty close to patch # 517521.  That 
patch only gains about 2% overall according to my PyBench 
1.0 tests.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 07:25:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 14 Mar 2002 23:25:37 -0800
Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr
Message-ID: <E16lm5p-0006lb-00@usw-sf-web2.sourceforge.net>

Patches item #529768, was opened at 2002-03-14 18:13
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Speed-up getattr

Initial Comment:
This patch moves the string check in getattr before the
Unicode check, reducing the number of IsSubType checks
originating from getattr to 50% in a typical application.

For the attached artificial benchmark, this gives a 7%
speed-up.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:25

Message:
Logged In: YES 
user_id=250749

This seems to be pretty close to patch # 517521.  That 
patch only gains about 2% overall according to my PyBench 
1.0 tests.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 18:25

Message:
Logged In: YES 
user_id=250749

This seems to be pretty close to patch # 517521.  That 
patch only gains about 2% overall according to my PyBench 
1.0 tests.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 08:19:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 00:19:18 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16lmvm-0003Sd-00@usw-sf-web1.sourceforge.net>

Patches item #517521, was opened at 2002-02-14 15:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 09:19

Message:
Logged In: YES 
user_id=21627

It is a fairly trivial change, and it has no ill effects, so
I think this it is worth the trouble (in particular since a
duplicate has been submitted as 529768).

Whether PEP 263 affects it depends on the implementation
strategy taken in phase 2; most likely, attribute accesses
remain as byte strings (it is already decided that they
remain restricted to ASCII).

Unless there are any strong objections to this patch, I'd
like to integrate it.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 08:31:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 00:31:43 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16ln7n-0007SJ-00@usw-sf-web2.sourceforge.net>

Patches item #517521, was opened at 2002-02-14 09:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-15 03:31

Message:
Logged In: YES 
user_id=31435

+1 on integrating the patch.  Better 2% today than 200% 
that may never materialize!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 03:19

Message:
Logged In: YES 
user_id=21627

It is a fairly trivial change, and it has no ill effects, so
I think this it is worth the trouble (in particular since a
duplicate has been submitted as 529768).

Whether PEP 263 affects it depends on the implementation
strategy taken in phase 2; most likely, attribute accesses
remain as byte strings (it is already decided that they
remain restricted to ASCII).

Unless there are any strong objections to this patch, I'd
like to integrate it.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 02:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 02:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 02:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 08:54:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 00:54:32 -0800
Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores
Message-ID: <E16lnTs-0008Rp-00@usw-sf-web4.sourceforge.net>

Patches item #525532, was opened at 2002-03-04 09:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add support for POSIX semaphores

Initial Comment:
thread_pthread.h can be modified to use POSIX 
semaphores if available. This is more efficient than 
emulating them with mutexes and condition variables, 
and at least one platform that supports POSIX 
semaphores has a race condition in its condition 
variable support.

The new file would still be supporting POSIX threads, 
although from both <pthread.h> and <semaphore.h>, so 
perhaps ought to be renamed if this patch is accepted.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-15 03:54

Message:
Logged In: YES 
user_id=31435

Can someone on a pthreads platform please continue with 
this?  I'm +1 on it via eyeballing.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:41:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:41:37 -0800
Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr
Message-ID: <E16lrxh-0000Ol-00@usw-sf-web3.sourceforge.net>

Patches item #529768, was opened at 2002-03-14 08:13
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Speed-up getattr

Initial Comment:
This patch moves the string check in getattr before the
Unicode check, reducing the number of IsSubType checks
originating from getattr to 50% in a typical application.

For the attached artificial benchmark, this gives a 7%
speed-up.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:41

Message:
Logged In: YES 
user_id=21627

Closing it as a duplicate.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:25

Message:
Logged In: YES 
user_id=250749

This seems to be pretty close to patch # 517521.  That 
patch only gains about 2% overall according to my PyBench 
1.0 tests.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:25

Message:
Logged In: YES 
user_id=250749

This seems to be pretty close to patch # 517521.  That 
patch only gains about 2% overall according to my PyBench 
1.0 tests.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:41:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:41:06 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16lrxC-0002YG-00@usw-sf-web2.sourceforge.net>

Patches item #517521, was opened at 2002-02-14 15:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:41

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as object.c 2.164.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 09:31

Message:
Logged In: YES 
user_id=31435

+1 on integrating the patch.  Better 2% today than 200% 
that may never materialize!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 09:19

Message:
Logged In: YES 
user_id=21627

It is a fairly trivial change, and it has no ill effects, so
I think this it is worth the trouble (in particular since a
duplicate has been submitted as 529768).

Whether PEP 263 affects it depends on the implementation
strategy taken in phase 2; most likely, attribute accesses
remain as byte strings (it is already decided that they
remain restricted to ASCII).

Unless there are any strong objections to this patch, I'd
like to integrate it.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:44:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:44:45 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16ls0j-0002aY-00@usw-sf-web2.sourceforge.net>

Patches item #517521, was opened at 2002-02-14 15:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:44

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as object.c 2.164.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:41

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as object.c 2.164.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 09:31

Message:
Logged In: YES 
user_id=31435

+1 on integrating the patch.  Better 2% today than 200% 
that may never materialize!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 09:19

Message:
Logged In: YES 
user_id=21627

It is a fairly trivial change, and it has no ill effects, so
I think this it is worth the trouble (in particular since a
duplicate has been submitted as 529768).

Whether PEP 263 affects it depends on the implementation
strategy taken in phase 2; most likely, attribute accesses
remain as byte strings (it is already decided that they
remain restricted to ASCII).

Unless there are any strong objections to this patch, I'd
like to integrate it.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 08:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:45:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:45:24 -0800
Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped
Message-ID: <E16ls1M-0002bJ-00@usw-sf-web2.sourceforge.net>

Patches item #530105, was opened at 2002-03-15 00:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Nobody/Anonymous (nobody)
Summary: file object may not be subtyped

Initial Comment:
PyFileObject should be defined in fileobject.h, so it 
may be properly subtyped. This patches fixes this, 
and also a comment word typed twice.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:45

Message:
Logged In: YES 
user_id=21627

This patch looks good to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:48:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:48:54 -0800
Subject: [Patches] [ python-Patches-527434 ] Double inclusion of thread.o on Sol2.8
Message-ID: <E16ls4k-0002dd-00@usw-sf-web2.sourceforge.net>

Patches item #527434, was opened at 2002-03-08 16:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527434&group_id=5470

Category: Build
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Double inclusion of thread.o on Sol2.8

Initial Comment:
When compiling on Solaris 2.8(sparc), the thread.o gets
included twice in the list of objects. The problem
arises when compiling python as shared library as you
may not specify the some thing twice. This patch evades
checking for -lthread if posix threads are already defined.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:48

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Applied as configure 1.287;
configure.in 1.297.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527434&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:54:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:54:21 -0800
Subject: [Patches] [ python-Patches-527427 ] minidom fails to use NodeList sometimes
Message-ID: <E16lsA1-0003Sm-00@usw-sf-web4.sourceforge.net>

Patches item #527427, was opened at 2002-03-08 16:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Eduardo Barros (cesarb)
Assigned to: Nobody/Anonymous (nobody)
Summary: minidom fails to use NodeList sometimes

Initial Comment:
(why is the summary box so small?)

xml.dom.minidom doesn't use a NodeList as the return
type of GetElementsByTagName{,NS} as it should. The
patch (against 2.2 or HEAD) fixes it.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:54

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as 1.44 and 1.43.6.1
(Python) and 1.39 (PyXML).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:54:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:54:52 -0800
Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py
Message-ID: <E16lsAW-0000Xk-00@usw-sf-web3.sourceforge.net>

Patches item #503202, was opened at 2002-01-13 23:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Fixed
Priority: 7
Submitted By: Hye-Shik Chang (perky)
Assigned to: Skip Montanaro (montanaro)
Summary: backward compat. on calendar.py

Initial Comment:
Many applications fails on 2.2 by this problem:

under 2.1.1 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun
>>> calendar.month_abbr[7:]
['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


2.2 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 
Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri 
Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu 
Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed 
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue 
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 
Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 
Mon Tue Wed Thu Fri Sat Sun Mon Tue
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
ValueError: year out of range
>>> calendar.month_abbr[7:]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
TypeError: an integer is required
>>> 

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-15 07:54

Message:
Logged In: YES 
user_id=44345

further update - 1.24 adds slicing capability - I missed 
the patch attached to the original report (thought it was 
a bug report and didn't even notice the diff - my 
apologies to perky).


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 22:09

Message:
Logged In: YES 
user_id=44345

fixed by calendar.py 1.23 and test_calendar.py 1.2.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-14 21:08

Message:
Logged In: YES 
user_id=12800

Go for it Skip!

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 19:41

Message:
Logged In: YES 
user_id=44345

Looks to me like adding

    if item > 6 or item < -7:
        raise IndexError

to the start of _localized_name.__getitem__ will do the 
trick.  (Should a test for non-integer items also be
added?)

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-14 17:05

Message:
Logged In: YES 
user_id=31435

Based on Guido's comment, categorized as 2.2.x and boosted 
priority to 7.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-14 00:18

Message:
Logged In: YES 
user_id=6380

You're right. Assigned to Barry.

I propose that the test suite should be changed to test for
this.

This would be a 2.2.1 bugfix candidate!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 13:55:40 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 05:55:40 -0800
Subject: [Patches] [ python-Patches-527427 ] minidom fails to use NodeList sometimes
Message-ID: <E16lsBI-0003Tb-00@usw-sf-web4.sourceforge.net>

Patches item #527427, was opened at 2002-03-08 16:39
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Cesar Eduardo Barros (cesarb)
Assigned to: Nobody/Anonymous (nobody)
Summary: minidom fails to use NodeList sometimes

Initial Comment:
(why is the summary box so small?)

xml.dom.minidom doesn't use a NodeList as the return
type of GetElementsByTagName{,NS} as it should. The
patch (against 2.2 or HEAD) fixes it.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:54

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as 1.44 and 1.43.6.1
(Python) and 1.39 (PyXML).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 14:05:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 06:05:01 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16lsKL-0002pc-00@usw-sf-web2.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 14:08:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 06:08:21 -0800
Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py
Message-ID: <E16lsNZ-0007Oz-00@usw-sf-web1.sourceforge.net>

Patches item #503202, was opened at 2002-01-13 23:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Fixed
Priority: 7
Submitted By: Hye-Shik Chang (perky)
Assigned to: Skip Montanaro (montanaro)
Summary: backward compat. on calendar.py

Initial Comment:
Many applications fails on 2.2 by this problem:

under 2.1.1 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun
>>> calendar.month_abbr[7:]
['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


2.2 ---
>>> import calendar
>>> for n in calendar.day_abbr:
...     print n,
... 
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 
Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri 
Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu 
Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed 
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue 
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 
Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun 
Mon Tue Wed Thu Fri Sat Sun Mon Tue
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
ValueError: year out of range
>>> calendar.month_abbr[7:]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/pkg/lib/python2.2/calendar.py", line 31, 
in __getitem__
    return strftime(self.format, (item,)*9).capitalize
()
TypeError: an integer is required
>>> 

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-15 08:08

Message:
Logged In: YES 
user_id=44345

further update - 1.24 adds slicing capability - I missed 
the patch attached to the original report (thought it was 
a bug report and didn't even notice the diff - my 
apologies to perky).


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-15 07:54

Message:
Logged In: YES 
user_id=44345

further update - 1.24 adds slicing capability - I missed 
the patch attached to the original report (thought it was 
a bug report and didn't even notice the diff - my 
apologies to perky).


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 22:09

Message:
Logged In: YES 
user_id=44345

fixed by calendar.py 1.23 and test_calendar.py 1.2.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-14 21:08

Message:
Logged In: YES 
user_id=12800

Go for it Skip!

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 19:41

Message:
Logged In: YES 
user_id=44345

Looks to me like adding

    if item > 6 or item < -7:
        raise IndexError

to the start of _localized_name.__getitem__ will do the 
trick.  (Should a test for non-integer items also be
added?)

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-14 17:05

Message:
Logged In: YES 
user_id=31435

Based on Guido's comment, categorized as 2.2.x and boosted 
priority to 7.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-14 00:18

Message:
Logged In: YES 
user_id=6380

You're right. Assigned to Barry.

I propose that the test suite should be changed to test for
this.

This would be a 2.2.1 bugfix candidate!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 14:37:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 06:37:21 -0800
Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr
Message-ID: <E16lspd-00012z-00@usw-sf-web3.sourceforge.net>

Patches item #517521, was opened at 2002-02-14 09:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimization for PyObject_Get/SetAttr

Initial Comment:
The attached patch is based on the assumption that the 
vast majority of calls to PyObject_GetAttr and 
PyObject_SetAttr use a PyString (rather than a 
PyUnicode) as the name parameter.  Because these 
routines perform a PyUnicode_Check first, every call 
(with a PyString as name) requires a call to 
PyType_IsSubType.  By reorganizing so that 
PyString_Check is called first, the call to 
PyType_IsSubType is avoided in the common case.  The 
same reorganization is done for 
PyObject_GenericGet/SetAttr.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-15 09:37

Message:
Logged In: YES 
user_id=6380

You mean 2.165, surely.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 08:44

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as object.c 2.164.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 08:41

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as object.c 2.164.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 03:31

Message:
Logged In: YES 
user_id=31435

+1 on integrating the patch.  Better 2% today than 200% 
that may never materialize!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 03:19

Message:
Logged In: YES 
user_id=21627

It is a fairly trivial change, and it has no ill effects, so
I think this it is worth the trouble (in particular since a
duplicate has been submitted as 529768).

Whether PEP 263 affects it depends on the implementation
strategy taken in phase 2; most likely, attribute accesses
remain as byte strings (it is already decided that they
remain restricted to ASCII).

Unless there are any strong objections to this patch, I'd
like to integrate it.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 02:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 02:18

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-15 02:16

Message:
Logged In: YES 
user_id=250749

I've tried this patch and it appears to have no ill affects 
on a FreeBSD 4.4 system, though I haven't exhaustively 
checked it.

My testing (using pystone.py and PyBench 1.0) shows only 
about 2% gain, which in isolation is hardly worth the 
bother (though a number of 2% gains can cumulatively be 
attactive).

I don't know at this point whether PEP 263 (if accepted) 
would have any affect on the change implemented by the 
patch; if so, it may not be worth pursuing.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 15:30:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 07:30:17 -0800
Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped
Message-ID: <E16lter-0004YQ-00@usw-sf-web4.sourceforge.net>

Patches item #530105, was opened at 2002-03-14 18:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470

Category: None
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Nobody/Anonymous (nobody)
Summary: file object may not be subtyped

Initial Comment:
PyFileObject should be defined in fileobject.h, so it 
may be properly subtyped. This patches fixes this, 
and also a comment word typed twice.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-15 10:30

Message:
Logged In: YES 
user_id=6380

Looks good to me too. Check it in.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 08:45

Message:
Logged In: YES 
user_id=21627

This patch looks good to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 17:03:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 09:03:45 -0800
Subject: [Patches] [ python-Patches-492105 ] Import from Zip archive
Message-ID: <E16lv7J-00011c-00@usw-sf-web1.sourceforge.net>

Patches item #492105, was opened at 2001-12-12 17:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: James C. Ahlstrom (ahlstromjc)
Assigned to: Nobody/Anonymous (nobody)
Summary: Import from Zip archive

Initial Comment:
This is the "final" patch to support imports from zip 
archives, and directory caching using os.listdir(). It 
replaces patch 483466 and 476047.  It is a separate 
patch since I can't delete file attachments.  It adds 
support for importing from "" and from relative paths.

----------------------------------------------------------------------

>Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2002-03-15 17:03

Message:
Logged In: YES 
user_id=64929

I still can't delete files, but I added a new file which
contains all diffs as a single file, and is made from the
current CVS tree (Mar 15, 2002).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 17:06:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 09:06:11 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16lv9f-0004xg-00@usw-sf-web2.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:06

Message:
Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\Ux\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 19:36

Message:
Logged In: YES 
user_id=38388

Hmm, whatever it takes to maintain backwards 
compatibility. Do you have an example ?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 18:31

Message:
Logged In: YES 
user_id=89016

What should replace do: Return u"?" or (end-start)*u"?"

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 16:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 17:19:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 09:19:42 -0800
Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks
Message-ID: <E16lvMk-00057t-00@usw-sf-web2.sourceforge.net>

Patches item #432401, was opened at 2001-06-12 15:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Postponed
Priority: 6
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode encoding error callbacks

Initial Comment:
This patch adds unicode error handling callbacks to the
encode functionality. With this patch it's possible to
not only pass 'strict', 'ignore' or 'replace' as the
errors argument to encode, but also a callable
function, that will be called with the encoding name,
the original unicode object and the position of the
unencodable character. The callback must return a
replacement unicode object that will be encoded instead
of the original character.

For example replacing unencodable characters with XML
character references can be done in the following way.

u"aäoöuüß".encode(
   "ascii",
   lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos])
)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:19

Message:
Logged In: YES 
user_id=89016

So this means that the encoder can collect illegal 
characters and pass it to the callback. "replace" will 
replace this with (end-start)*u"?".

Decoders don't collect all illegal byte sequences, but call 
the callback once for every byte sequence that has been 
found illegal and "replace" will replace it with u"?".

Does this make sense?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-15 18:06

Message:
Logged In: YES 
user_id=89016

For encoding it's always (end-start)*u"?":
>>> u"ää".encode("ascii", "replace")
'??'

But for decoding, it is neither nor:
>>> "\Ux\U".decode("unicode-escape", "replace")
u'\ufffd\ufffd'

i.e. a sequence of 5 illegal characters was replace by two 
replacement characters. This might mean that decoders can't 
collect all the illegal characters and call the callback 
once. They might have to call the callback for every single 
illegal byte sequence to get the old behaviour.

(It seems that this patch would be much, much simpler, if 
we only change the encoders)

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 19:36

Message:
Logged In: YES 
user_id=38388

Hmm, whatever it takes to maintain backwards 
compatibility. Do you have an example ?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 18:31

Message:
Logged In: YES 
user_id=89016

What should replace do: Return u"?" or (end-start)*u"?"

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-08 16:15

Message:
Logged In: YES 
user_id=38388

Sounds like a good idea. Please keep the encoder and 
decoder APIs symmetric, though, ie. add the slice
information to both APIs. The slice should use the
same format as Python's standard slices, that is
left inclusive, right exclusive.

I like the highlighting feature !


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-08 00:09

Message:
Logged In: YES 
user_id=89016

I'm think about extending the API a little bit:

Consider the following example:
>>> "\u1".decode("unicode-escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' 
can't decode byte 0x31 
in position 2: truncated \uXXXX escape

The error message is a lie: Not the '1' 
in position 2 is the problem, but the 
complete truncated sequence '\u1'. 
For this the decoder should pass a start 
and an end position to the handler.

For encoding this would be useful too: 
Suppose I want to have an encoder that 
colors the unencodable character via an 
ANSI escape sequences. Then I could do 
the following:
>>> import codecs
>>> def color(enc, uni, pos, why, sta):
...    return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1)
... 
>>> codecs.register_unicodeencodeerrorhandler("color", 
color)
>>> u"aäüöo".encode("ascii", "color")
'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b
[0mo'

But here the sequences "\x1b[0m\x1b[1m" are not needed.

To fix this problem the encoder could collect as many
unencodable characters as possible and pass those to 
the error callback in one go (passing a start and 
end+1 position).

This fixes the above problem and reduces the number of 
calls to the callback, so it should speed up the 
algorithms in case of custom encoding names. 
(And it makes the implementation very interesting ;))

What do you think?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-07 02:29

Message:
Logged In: YES 
user_id=89016

I started from scratch, and the current state is this:

Encoding mostly works (except that I haven't changed 
TranslateCharmap and EncodeDecimal yet) and most of the 
decoding stuff works (DecodeASCII and DecodeCharmap are 
still unchanged) and the decoding callback helper isn't 
optimized for the "builtin" names yet (i.e. it still calls 
the handler).

For encoding the callback helper knows how to 
handle "strict", "replace", "ignore" 
and "xmlcharrefreplace" itself and won't call the callback. 
This should make the encoder fast enough. As callback name 
string comparison results are cached it might even be 
faster than the original.

The patch so far didn't require any changes to 
unicodeobject.h, stringobject.h or stringobject.c


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-05 17:49

Message:
Logged In: YES 
user_id=38388

Walter, are you making any progress on the new scheme
we discussed on the mailing list (adding an error handler
registry much like the codec registry itself instead of trying 
to redo the complete codec API) ?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-09-20 12:38

Message:
Logged In: YES 
user_id=38388

I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. 

Walter, you may want to reference this patch in the PEP.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 12:53

Message:
Logged In: YES 
user_id=38388

I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as 
well.

I'll look into this after I'm back from vacation on the 10.09.

Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge 
and probably needs a lot of testing first.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-27 05:55

Message:
Logged In: YES 
user_id=89016

Changing the decoding API is done now. There 
are new functions
codec.register_unicodedecodeerrorhandler and
codec.lookup_unicodedecodeerrorhandler. 
Only the standard handlers for 'strict', 
'ignore' and 'replace' are preregistered.

There may be many reasons for decoding errors 
in the byte string, so I added an additional
argument to the decoding API: reason, which 
gives the reason for the failure, e.g.:

>>> "\U1111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 8: truncated \UXXXXXXXX escape
>>> "\U11111111".decode("unicode_escape")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'unicodeescape' can't decode byte 
0x31 in position 9: illegal Unicode character

For symmetry I added this to the encoding API too:
>>> u"\xff".encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: encoding 'ascii' can't decode byte 0xff in 
position 0: ordinal not in range(128)

The parameters passed to the callbacks now are:
encoding, unicode, position, reason, state.

The encoding and decoding API for strings has been 
adapted too, so now the new API should be usable 
everywhere:

>>> unicode("a\xffb\xffc", "ascii", 
...    lambda enc, uni, pos, rea, sta: (u"<?>", pos+1))
u'a<?>b<?>c'
>>> "a\xffb\xffc".decode("ascii",
...    lambda enc, uni, pos, rea, sta: (u"<?>", 
pos+1))            
u'a<?>b<?>c'

I had a problem with the decoding API: all the 
functions in _codecsmodule.c used the t# format 
specifier. I changed that to O! with 
&PyString_Type, because otherwise we would have 
the problem that the decoding API would must pass
buffer object around instead of strings, and 
the callback would have to call str() on the 
buffer anyway to access a specific character, so 
this wouldn't be any faster than calling str() 
on the buffer before decoding. It seems that 
buffers  aren't used anyway. 

I changed all the old function to call the new 
ones so bugfixes don't have to be done in two 
places. There are two exceptions: I didn't 
change PyString_AsEncodedString and 
PyString_AsDecodedString because they are 
documented as deprecated anyway (although they 
are called in a few spots) This means that I 
duplicated part of their functionality in 
PyString_AsEncodedObjectEx and 
PyString_AsDecodedObjectEx.

There are still a few spots that call the old API:
E.g. PyString_Format still calls PyUnicode_Decode 
(but with strict decoding) because it passes the 
rest of the format string to PyUnicode_Format 
when it encounters a Unicode object.

Should we switch to the new API everywhere even 
if strict encoding/decoding is used?

The size of this patch begins to scare me. I 
guess we need an extensive test script for all the 
new features and documentation. I hope you have time 
to do that, as I'll be busy with other projects in
the next weeks. (BTW, I have't touched 
PyUnicode_TranslateCharmap yet.)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-23 19:03

Message:
Logged In: YES 
user_id=89016

New version of the patch with the error handling callback 
registry. 

> > OK, done, now there's a
> > PyCodec_EscapeReplaceUnicodeEncodeErrors/
> > codecs.escapereplace_unicodeencode_errors
> > that uses \u (or \U if x>0xffff (with a wide build
> > of Python)).
> 
> Great!

Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x
in addition to \u and \U where appropriate.

> > [...] 
> > But for special one-shot error handlers, it might still 
be
> > useful to pass the error handler directly, so maybe we
> > should leave error as PyObject *, but implement the
> > registry anyway?
> 
> Good idea !
> 
> One minor nit: codecs.registerError() should be named
> codecs.register_errorhandler() to be more inline with
> the Python coding style guide.

OK, but these function are specific to unicode encoding,
so now the functions are called:
   codecs.register_unicodeencodeerrorhandler
   codecs.lookup_unicodeencodeerrorhandler

Now all callbacks (including the new 
ones: "xmlcharrefreplace" 
and "escapereplace") are registered in the 
codecs.c/_PyCodecRegistry_Init so using them is really 
simple: u"gürk".encode("ascii", "xmlcharrefreplace")


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-13 13:26

Message:
Logged In: YES 
user_id=38388

> > >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> > >    > > could be reimplemented as PyUnicode_EncodeASCII
> > >    > > with \uxxxx replacement callback.
> > >    >
> > >    > Hmm, wouldn't that result in a slowdown ? If so,
> > >    > I'd rather leave the special encoder in place,
> > >    > since it is being used a lot in Python and
> > >    > probably some applications too.
> > >
> > >    It would be a slowdown. But callbacks open many
> > >    possiblities.
> >
> > True, but in this case I believe that we should stick with
> > the native implementation for "unicode-escape". Having
> > a standard callback error handler which does the \uXXXX
> > replacement would be nice to have though, since this would
> > also be usable with lots of other codecs (e.g. all the
> > code page ones).
> 
> OK, done, now there's a
> PyCodec_EscapeReplaceUnicodeEncodeErrors/
> codecs.escapereplace_unicodeencode_errors
> that uses \u (or \U if x>0xffff (with a wide build
> of Python)).

Great !
 
> > [...]
> > >    Should the old TranslateCharmap map to the new
> > >    TranslateCharmapEx and inherit the
> > >    "multicharacter replacement" feature,
> > >    or should I leave it as it is?
> >
> > If possible, please also add the multichar replacement
> > to the old API. I think it is very useful and since the
> > old APIs work on raw buffers it would be a benefit to have
> > the functionality in the old implementation too.
> 
> OK! I will try to find the time to implement that in the
> next days.

Good.
 
> > [Decoding error callbacks]
> >
> > About the return value:
> >
> > I'd suggest to always use the same tuple interface, e.g.
> >
> >     callback(encoding, input_data, input_position,
> state) ->
> >         (output_to_be_appended, new_input_position)
> >
> > (I think it's better to use absolute values for the
> > position rather than offsets.)
> >
> > Perhaps the encoding callbacks should use the same
> > interface... what do you think ?
> 
> This would make the callback feature hypergeneric and a
> little slower, because tuples have to be created, but it
> (almost) unifies the encoding and decoding API. ("almost"
> because, for the encoder output_to_be_appended will be
> reencoded, for the decoder it will simply be appended.),
> so I'm for it.

That's the point. 

Note that I don't think the tuple creation
will hurt much (see the make_tuple() API in codecs.c)
since small tuples are cached by Python internally.
 
> I implemented this and changed the encoders to only
> lookup the error handler on the first error. The UCS1
> encoder now no longer uses the two-item stack strategy.
> (This strategy only makes sense for those encoder where
> the encoding itself is much more complicated than the
> looping/callback etc.) So now memory overflow tests are
> only done, when an unencodable error occurs, so now the
> UCS1 encoder should be as fast as it was without
> error callbacks.
> 
> Do we want to enforce new_input_position>input_position,
> or should jumping back be allowed?

No; moving backwards should be allowed (this may be useful
in order to resynchronize with the input data).
 
> Here's is the current todo list:
> 1. implement a new TranslateCharmap and fix the old.
> 2. New encoding API for string objects too.
> 3. Decoding
> 4. Documentation
> 5. Test cases
> 
> I'm thinking about a different strategy for implementing
> callbacks
> (see http://mail.python.org/pipermail/i18n-sig/2001-
> July/001262.html)
> 
> We coould have a error handler registry, which maps names
> to error handlers, then it would be possible to keep the
> errors argument as "const char *" instead of "PyObject *".
> Currently PyCodec_UnicodeEncodeHandlerForObject is a
> backwards compatibility hack that will never go away,
> because
> it's always more convenient to type
>    u"...".encode("...", "strict")
> instead of
>    import codecs
>    u"...".encode("...", codecs.raise_encode_errors)
> 
> But with an error handler registry this function would
> become the official lookup method for error handlers.
> (PyCodec_LookupUnicodeEncodeErrorHandler?)
> Python code would look like this:
> ---
> def xmlreplace(encoding, unicode, pos, state):
>    return (u"&#%d;" % ord(uni[pos]), pos+1)
> 
> import codec
> 
> codec.registerError("xmlreplace",xmlreplace)
> ---
> and then the following call can be made:
>         u"äöü".encode("ascii", "xmlreplace")
> As soon as the first error is encountered, the encoder uses
> its builtin error handling method if it recognizes the name
> ("strict", "replace" or "ignore") or looks up the error
> handling function in the registry if it doesn't. In this way
> the speed for the backwards compatible features is the same
> as before and "const char *error" can be kept as the
> parameter to all encoding functions. For speed common error
> handling names could even be implemented in the encoder
> itself.
> 
> But for special one-shot error handlers, it might still be
> useful to pass the error handler directly, so maybe we
> should leave error as PyObject *, but implement the
> registry anyway?

Good idea !

One minor nit: codecs.registerError() should be named
codecs.register_errorhandler() to be more inline with
the Python coding style guide.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-07-12 13:03

Message:
Logged In: YES 
user_id=89016

> >    [...]
> >    so I guess we could change the replace handler
> >    to always return u'?'. This would make the
> >    implementation a little bit simpler, but the 
> >    explanation of the callback feature *a lot* 
> >    simpler. 
> 
> Go for it.

OK, done!

> [...]
> >    > Could you add these docs to the Misc/unicode.txt
> >    > file ? I will eventually take that file and turn 
> >    > it into a PEP which will then serve as general 
> >    > documentation for these things.
> > 
> >    I could, but first we should work out how the 
> >    decoding callback API will work.
> 
> Ok. BTW, Barry Warsaw already did the work of converting
> the unicode.txt to PEP 100, so the docs should eventually 
> go there.

OK. I guess it would be best to do this when everything 
is finished.

> >    > > BTW, I guess PyUnicode_EncodeUnicodeEscape
> >    > > could be reimplemented as PyUnicode_EncodeASCII 
> >    > > with \uxxxx replacement callback.
> >    >
> >    > Hmm, wouldn't that result in a slowdown ? If so,
> >    > I'd rather leave the special encoder in place, 
> >    > since it is being used a lot in Python and 
> >    > probably some applications too.
> > 
> >    It would be a slowdown. But callbacks open many 
> >    possiblities.
> 
> True, but in this case I believe that we should stick with
> the native implementation for "unicode-escape". Having
> a standard callback error handler which does the \uXXXX
> replacement would be nice to have though, since this would
> also be usable with lots of other codecs (e.g. all the
> code page ones).

OK, done, now there's a 
PyCodec_EscapeReplaceUnicodeEncodeErrors/
codecs.escapereplace_unicodeencode_errors
that uses \u (or \U if x>0xffff (with a wide build
of Python)).

> >    For example:
> > 
> >       Why can't I print u"gürk"?
> > 
> >    is probably one of the most frequently asked
> >    questions in comp.lang.python. For printing 
> >    Unicode stuff, print could be extended the use an 
> >    error handling callback for Unicode strings (or 
> >    objects where __str__ or tp_str returns a Unicode 
> >    object) instead of using str() which always 
> >    returns an 8bit string and uses strict encoding. 
> >    There might even be a
> >    sys.setprintencodehandler()/sys.getprintencodehandler
()
> 
> There already is a print callback in Python (forgot the
> name of the hook though), so this should be possible by 
> providing the encoding logic in the hook.

True: sys.displayhook

> [...]
> >    Should the old TranslateCharmap map to the new 
> >    TranslateCharmapEx and inherit the 
> >    "multicharacter replacement" feature,
> >    or should I leave it as it is?
> 
> If possible, please also add the multichar replacement
> to the old API. I think it is very useful and since the
> old APIs work on raw buffers it would be a benefit to have
> the functionality in the old implementation too.

OK! I will try to find the time to implement that in the 
next days.

> [Decoding error callbacks]
>
> About the return value:
> 
> I'd suggest to always use the same tuple interface, e.g.
> 
>     callback(encoding, input_data, input_position, 
state) -> 
>         (output_to_be_appended, new_input_position)
> 
> (I think it's better to use absolute values for the 
> position rather than offsets.)
> 
> Perhaps the encoding callbacks should use the same 
> interface... what do you think ?

This would make the callback feature hypergeneric and a
little slower, because tuples have to be created, but it
(almost) unifies the encoding and decoding API. ("almost" 
because, for the encoder output_to_be_appended will be 
reencoded, for the decoder it will simply be appended.), 
so I'm for it.

I implemented this and changed the encoders to only 
lookup the error handler on the first error. The UCS1 
encoder now no longer uses the two-item stack strategy. 
(This strategy only makes sense for those encoder where 
the encoding itself is much more complicated than the 
looping/callback etc.) So now memory overflow tests are 
only done, when an unencodable error occurs, so now the 
UCS1 encoder should be as fast as it was without 
error callbacks.

Do we want to enforce new_input_position>input_position,
or should jumping back be allowed?

> >    > > One additional note: It is vital that errors
> >    > > is an assignable attribute of the StreamWriter.
> >    >
> >    > It is already !
> > 
> >    I know, but IMHO it should be documented that an
> >    assignable errors attribute must be supported 
> >    as part of the official codec API.
> > 
> >    Misc/unicode.txt is not clear on that:
> >    """
> >    It is not required by the Unicode implementation
> >    to use these base classes, only the interfaces must 
> >    match; this allows writing Codecs as extension types.
> >    """
> 
> Good point. I'll add that to the PEP 100.

OK.

Here's is the current todo list:
1. implement a new TranslateCharmap and fix the old.
2. New encoding API for string objects too.
3. Decoding
4. Documentation
5. Test cases

I'm thinking about a different strategy for implementing 
callbacks
(see http://mail.python.org/pipermail/i18n-sig/2001-
July/001262.html)

We coould have a error handler registry, which maps names 
to error handlers, then it would be possible to keep the 
errors argument as "const char *" instead of "PyObject *". 
Currently PyCodec_UnicodeEncodeHandlerForObject is a 
backwards compatibility hack that will never go away, 
because 
it's always more convenient to type
   u"...".encode("...", "strict")
instead of
   import codecs
   u"...".encode("...", codecs.raise_encode_errors)

But with an error handler registry this function would 
become the official lookup method for error handlers. 
(PyCodec_LookupUnicodeEncodeErrorHandler?)
Python code would look like this:
---
def xmlreplace(encoding, unicode, pos, state):
   return (u"&#%d;" % ord(uni[pos]), pos+1)

import codec

codec.registerError("xmlreplace",xmlreplace)
---
and then the following call can be made:
	u"äöü".encode("ascii", "xmlreplace")
As soon as the first error is encountered, the encoder uses
its builtin error handling method if it recognizes the name 
("strict", "replace" or "ignore") or looks up the error 
handling function in the registry if it doesn't. In this way
the speed for the backwards compatible features is the same 
as before and "const char *error" can be kept as the 
parameter to all encoding functions. For speed common error 
handling names could even be implemented in the encoder 
itself.

But for special one-shot error handlers, it might still be 
useful to pass the error handler directly, so maybe we 
should leave error as PyObject *, but implement the 
registry anyway?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-10 14:29

Message:
Logged In: YES 
user_id=38388

Ok, here we go...

>    > > raise an exception). U+FFFD characters in the 
>    replacement
>    > > string will be replaced with a character that the 
>    encoder
>    > > chooses ('?' in all cases).
>    >
>    > Nice.
> 
>    But the special casing of U+FFFD makes the interface 
>    somewhat
>    less clean than it could be. It was only done to be 100%
>    backwards compatible. With the original "replace"
>    error
>    handling the codec chose the replacement character. But as
>    far as I can tell none of the codecs uses anything other
>    than '?', 

True.

>    so I guess we could change the replace handler
>    to always return u'?'. This would make the implementation a
>    little bit simpler, but the explanation of the callback
>    feature *a lot* simpler. 

Go for it.

>    And if you still want to handle
>    an unencodable U+FFFD, you can write a special callback for
>    that, e.g.
> 
>    def FFFDreplace(enc, uni, pos):
>    if uni[pos] == "\ufffd":
>    return u"?"
>    else:
>    raise UnicodeError(...)
>
>    > ...docs...
>    >
>    > Could you add these docs to the Misc/unicode.txt file ? I
>    > will eventually take that file and turn it into a PEP 
>    which
>    > will then serve as general documentation for these things.
> 
>    I could, but first we should work out how the decoding
>    callback API will work.

Ok. BTW, Barry Warsaw already did the work of converting the
unicode.txt to PEP 100, so the docs should eventually go there.
 
>    > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
>    > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
>    > > replacement callback.
>    >
>    > Hmm, wouldn't that result in a slowdown ? If so, I'd 
>    rather
>    > leave the special encoder in place, since it is being 
>    used a
>    > lot in Python and probably some applications too.
> 
>    It would be a slowdown. But callbacks open many 
>    possiblities.

True, but in this case I believe that we should stick with
the native implementation for "unicode-escape". Having
a standard callback error handler which does the \uXXXX
replacement would be nice to have though, since this would
also be usable with lots of other codecs (e.g. all the code page
ones).
 
>    For example:
> 
>       Why can't I print u"gürk"?
> 
>    is probably one of the most frequently asked questions in
>    comp.lang.python. For printing Unicode stuff, print could be
>    extended the use an error handling callback for Unicode 
>    strings (or objects where __str__ or tp_str returns a 
>    Unicode object) instead of using str() which always returns 
>    an 8bit string and uses strict encoding. There might even 
>    be a
>    sys.setprintencodehandler()/sys.getprintencodehandler()

There already is a print callback in Python (forgot the name of the
hook though), so this should be possible by providing the
encoding logic in the hook.
 
>    > > I have not touched PyUnicode_TranslateCharmap yet,
>    > > should this function also support error callbacks? Why
>    > > would one want the insert None into the mapping to
>    call
>    > > the callback?
>    >
>    > 1. Yes.
>    > 2. The user may want to e.g. restrict usage of certain
>    > character ranges. In this case the codec would be used to
>    > verify the input and an exception would indeed be useful
>    > (e.g. say you want to restrict input to Hangul + ASCII).
> 
>    OK, do we want TranslateCharmap to work exactly like 
>    encoding,
>    i.e. in case of an error should the returned replacement
>    string again be mapped through the translation mapping or
>    should it be copied to the output directly? The former would
>    be more in line with encoding, but IMHO the latter would
>    be much more useful.

It's better to take the second approach (copy the callback
output directly to the output string) to avoid endless
recursion and other pitfalls.

I suppose this will also simplify the implementation somewhat.
 
>    BTW, when I implement it I can implement patch #403100
>    ("Multicharacter replacements in 
>    PyUnicode_TranslateCharmap")
>    along the way.

I've seen it; will comment on it later.
 
>    Should the old TranslateCharmap map to the new 
>    TranslateCharmapEx
>    and inherit the "multicharacter replacement" feature,
>    or
>    should I leave it as it is?

If possible, please also add the multichar replacement
to the old API. I think it is very useful and since the
old APIs work on raw buffers it would be a benefit to have
the functionality in the old implementation too.
 
[Decoding error callbacks]

>    > > A remaining problem is how to implement decoding error
>    > > callbacks. In Python 2.1 encoding and decoding errors 
>    are
>    > > handled in the same way with a string value. But with
>    > > callbacks it doesn't make sense to use the same
>    callback
>    > > for encoding and decoding (like 
>    codecs.StreamReaderWriter
>    > > and codecs.StreamRecoder do). Decoding callbacks have
>    a
>    > > different API. Which arguments should be passed to the
>    > > decoding callback, and what is the decoding callback
>    > > supposed to do?
>    >
>    > I'd suggest adding another set of PyCodec_UnicodeDecode...
>    ()
>    > APIs for this. We'd then have to augment the base classes 
>    of
>    > the StreamCodecs to provide two attributes for .errors 
>    with
>    > a fallback solution for the string case (i.s. "strict"
>    can
>    > still be used for both directions).
> 
>    Sounds good. Now what is the decoding callback supposed to 
>    do?
>    I guess it will be called in the same way as the encoding
>    callback, i.e. with encoding name, original string and
>    position of the error. It might returns a Unicode string
>    (i.e. an object of the decoding target type), that will be
>    emitted from the codec instead of the one offending byte. Or
>    it might return a tuple with replacement Unicode object and
>    a resynchronisation offset, i.e. returning (u"?", 1)
>    means
>    emit a '?' and skip the offending character. But to make
>    the offset really useful the callback has to know something
>    about the encoding, perhaps the codec should be allowed to
>    pass an additional state object to the callback?
> 
>    Maybe the same should be added to the encoding callbacks to?
>    Maybe the encoding callback should be able to tell the
>    encoder if the replacement returned should be reencoded
>    (in which case it's a Unicode object), or directly emitted
>    (in which case it's an 8bit string)?

I like the idea of having an optional state object (basically
this should be a codec-defined arbitrary Python object)
which then allow the callback to apply additional tricks.
The object should be documented to be modifyable in place
(simplifies the interface).

About the return value:

I'd suggest to always use the same tuple interface, e.g.

    callback(encoding, input_data, input_position, state) -> 
        (output_to_be_appended, new_input_position)

(I think it's better to use absolute values for the position 
rather than offsets.)

Perhaps the encoding callbacks should use the same 
interface... what do you think ?

>    > > One additional note: It is vital that errors is an
>    > > assignable attribute of the StreamWriter.
>    >
>    > It is already !
> 
>    I know, but IMHO it should be documented that an assignable
>    errors attribute must be supported as part of the official
>    codec API.
> 
>    Misc/unicode.txt is not clear on that:
>    """
>    It is not required by the Unicode implementation to use 
>    these base classes, only the interfaces must match; this 
>    allows writing Codecs as extension types.
>    """

Good point. I'll add that to the PEP 100.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-22 22:51

Message:
Logged In: YES 
user_id=38388

Sorry to keep you waiting, Walter. I will look into this
again next week -- this week was way too busy...

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 19:00

Message:
Logged In: YES 
user_id=38388

On your comment about the non-Unicode codecs: let's keep
this separated from the current patch.

Don't have much time today. I'll comment on the other things
tomorrow.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 17:49

Message:
Logged In: YES 
user_id=89016

Guido van Rossum wrote in python-dev:

> True, the "codec" pattern can be used for other 
> encodings than Unicode.  But it seems to me that the
> entire codecs architecture is rather strongly geared
> towards en/decoding Unicode, and it's not clear
> how well other codecs fit in this pattern (e.g. I 
> noticed that all the non-Unicode codecs ignore the 
> error handling parameter or assert that
> it is set to 'strict').

I noticed that too. asserting that errors=='strict' would 
mean that the encoder is not able to deal in any other way 
with unencodable stuff than by raising an error. But that 
is not the problem here, because for zlib, base64, quopri, 
hex and uu encoding there can be no unencodable characters. 
The encoders can simply ignore the errors parameter. Should 
I remove the asserts from those codecs and change the 
docstrings accordingly, or will this be done separately?


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-13 15:57

Message:
Logged In: YES 
user_id=89016

> > [...]
> > raise an exception). U+FFFD characters in the 
replacement
> > string will be replaced with a character that the 
encoder
> > chooses ('?' in all cases).
>
> Nice.

But the special casing of U+FFFD makes the interface 
somewhat
less clean than it could be. It was only done to be 100%
backwards compatible. With the original "replace" error
handling the codec chose the replacement character. But as
far as I can tell none of the codecs uses anything other
than '?', so I guess we could change the replace handler
to always return u'?'. This would make the implementation a
little bit simpler, but the explanation of the callback
feature *a lot* simpler. And if you still want to handle
an unencodable U+FFFD, you can write a special callback for
that, e.g.

def FFFDreplace(enc, uni, pos):
if uni[pos] == "\ufffd":
return u"?"
else:
raise UnicodeError(...)

> > The implementation of the loop through the string is 
done
> > in the following way. A stack with two strings is kept
> > and the loop always encodes a character from the string
> > at the stacktop. If an error is encountered and the 
stack
> > has only one entry (during encoding of the original 
string)
> > the callback is called and the unicode object returned 
is
> > pushed on the stack, so the encoding continues with the
> > replacement string. If the stack has two entries when an
> > error is encountered, the replacement string itself has
> > an unencodable character and a normal exception raised.
> > When the encoder has reached the end of it's current 
string
> > there are two possibilities: when the stack contains two
> > entries, this was the replacement string, so the 
replacement
> > string will be poppep from the stack and encoding 
continues
> > with the next character from the original string. If the
> > stack had only one entry, encoding is finished.
>
> Very elegant solution !

I'll put it as a comment in the source.

> > (I hope that's enough explanation of the API and
> implementation)
>
> Could you add these docs to the Misc/unicode.txt file ? I
> will eventually take that file and turn it into a PEP 
which
> will then serve as general documentation for these things.

I could, but first we should work out how the decoding
callback API will work.

> > I have renamed the static ...121 function to all 
lowercase
> > names.
>
> Ok.
>
> > BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> > reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> > replacement callback.
>
> Hmm, wouldn't that result in a slowdown ? If so, I'd 
rather
> leave the special encoder in place, since it is being 
used a
> lot in Python and probably some applications too.

It would be a slowdown. But callbacks open many 
possiblities.

For example:

   Why can't I print u"gürk"?

is probably one of the most frequently asked questions in
comp.lang.python. For printing Unicode stuff, print could be
extended the use an error handling callback for Unicode 
strings (or objects where __str__ or tp_str returns a 
Unicode object) instead of using str() which always returns 
an 8bit string and uses strict encoding. There might even 
be a
sys.setprintencodehandler()/sys.getprintencodehandler()

> [...]
> I think it would be worthwhile to rename the callbacks to
> include "Unicode" somewhere, e.g.
> PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, 
but
> then it points out the application field of the callback
> rather well. Same for the callbacks exposed through the
> _codecsmodule.

OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors
really is a long name ;))

> > I have not touched PyUnicode_TranslateCharmap yet,
> > should this function also support error callbacks? Why
> > would one want the insert None into the mapping to call
> > the callback?
>
> 1. Yes.
> 2. The user may want to e.g. restrict usage of certain
> character ranges. In this case the codec would be used to
> verify the input and an exception would indeed be useful
> (e.g. say you want to restrict input to Hangul + ASCII).

OK, do we want TranslateCharmap to work exactly like 
encoding,
i.e. in case of an error should the returned replacement
string again be mapped through the translation mapping or
should it be copied to the output directly? The former would
be more in line with encoding, but IMHO the latter would
be much more useful.

BTW, when I implement it I can implement patch #403100
("Multicharacter replacements in 
PyUnicode_TranslateCharmap")
along the way.

Should the old TranslateCharmap map to the new 
TranslateCharmapEx
and inherit the "multicharacter replacement" feature, or
should I leave it as it is?

> > A remaining problem is how to implement decoding error
> > callbacks. In Python 2.1 encoding and decoding errors 
are
> > handled in the same way with a string value. But with
> > callbacks it doesn't make sense to use the same callback
> > for encoding and decoding (like 
codecs.StreamReaderWriter
> > and codecs.StreamRecoder do). Decoding callbacks have a
> > different API. Which arguments should be passed to the
> > decoding callback, and what is the decoding callback
> > supposed to do?
>
> I'd suggest adding another set of PyCodec_UnicodeDecode...
()
> APIs for this. We'd then have to augment the base classes 
of
> the StreamCodecs to provide two attributes for .errors 
with
> a fallback solution for the string case (i.s. "strict" can
> still be used for both directions).

Sounds good. Now what is the decoding callback supposed to 
do?
I guess it will be called in the same way as the encoding
callback, i.e. with encoding name, original string and
position of the error. It might returns a Unicode string
(i.e. an object of the decoding target type), that will be
emitted from the codec instead of the one offending byte. Or
it might return a tuple with replacement Unicode object and
a resynchronisation offset, i.e. returning (u"?", 1) means
emit a '?' and skip the offending character. But to make
the offset really useful the callback has to know something
about the encoding, perhaps the codec should be allowed to
pass an additional state object to the callback?

Maybe the same should be added to the encoding callbacks to?
Maybe the encoding callback should be able to tell the
encoder if the replacement returned should be reencoded
(in which case it's a Unicode object), or directly emitted
(in which case it's an 8bit string)?

> > One additional note: It is vital that errors is an
> > assignable attribute of the StreamWriter.
>
> It is already !

I know, but IMHO it should be documented that an assignable
errors attribute must be supported as part of the official
codec API.

Misc/unicode.txt is not clear on that:
"""
It is not required by the Unicode implementation to use 
these base classes, only the interfaces must match; this 
allows writing Codecs as extension types.
"""

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-13 10:05

Message:
Logged In: YES 
user_id=38388

> How the callbacks work:
> 
> A PyObject * named errors is passed in. This may by NULL,
> Py_None, 'strict', u'strict', 'ignore', u'ignore',
> 'replace', u'replace' or a callable object.
> PyCodec_EncodeHandlerForObject maps all of these objects
to
> one of the three builtin error callbacks
> PyCodec_RaiseEncodeErrors (raises an exception),
> PyCodec_IgnoreEncodeErrors (returns an empty replacement
> string, in effect ignoring the error),
> PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
> replacement character to signify to the encoder that it
> should choose a suitable replacement character) or
directly
> returns errors if it is a callable object. When an
> unencodable character is encounterd the error handling
> callback will be called with the encoding name, the
original
> unicode object and the error position and must return a
> unicode object that will be encoded instead of the
offending
> character (or the callback may of course raise an
> exception). U+FFFD characters in the replacement string
will
> be replaced with a character that the encoder chooses ('?'
> in all cases).

Nice.
 
> The implementation of the loop through the string is done
in
> the following way. A stack with two strings is kept and
the
> loop always encodes a character from the string at the
> stacktop. If an error is encountered and the stack has
only
> one entry (during encoding of the original string) the
> callback is called and the unicode object returned is
pushed
> on the stack, so the encoding continues with the
replacement
> string. If the stack has two entries when an error is
> encountered, the replacement string itself has an
> unencodable character and a normal exception raised. When
> the encoder has reached the end of it's current string
there
> are two possibilities: when the stack contains two
entries,
> this was the replacement string, so the replacement string
> will be poppep from the stack and encoding continues with
> the next character from the original string. If the stack
> had only one entry, encoding is finished.

Very elegant solution !
 
> (I hope that's enough explanation of the API and
implementation)

Could you add these docs to the Misc/unicode.txt file ? I
will eventually take that file and turn it into a PEP which
will then serve as general documentation for these things.
 
> I have renamed the static ...121 function to all lowercase
> names.

Ok.
 
> BTW, I guess PyUnicode_EncodeUnicodeEscape could be
> reimplemented as PyUnicode_EncodeASCII with a \uxxxx
> replacement callback.

Hmm, wouldn't that result in a slowdown ? If so, I'd rather
leave the special encoder in place, since it is being used a
lot in Python and probably some applications too.
 
> PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
> PyCodec_ReplaceEncodeErrors are globally visible because
> they have to be available in _codecsmodule.c to wrap them
as
> Python function objects, but they can't be implemented in
> _codecsmodule, because they need to be available to the
> encoders in unicodeobject.c (through
> PyCodec_EncodeHandlerForObject), but importing the codecs
> module might result in an endless recursion, because
> importing a module requires unpickling of the bytecode,
> which might require decoding utf8, which ... (but this
will
> only happen, if we implement the same mechanism for the
> decoding API)

I think that codecs.c is the right place for these APIs.
_codecsmodule.c is only meant as Python access wrapper for
the internal codecs and nothing more. 

One thing I noted about the callbacks: they assume that they
will always get Unicode objects as input. This is certainly
not true in the general case (it is for the codecs you touch
in the patch). 

I think it would be worthwhile to rename the callbacks to
include "Unicode" somewhere, e.g.
PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but
then it points out the application field of the callback
rather well. Same for the callbacks exposed through the
_codecsmodule.

> I have not touched PyUnicode_TranslateCharmap yet,
> should this function also support error callbacks? Why
would
> one want the insert None into the mapping to call the
callback?

1. Yes.
2. The user may want to e.g. restrict usage of certain
character ranges. In this case the codec would be used to
verify the input and an exception would indeed be useful
(e.g. say you want to restrict input to Hangul + ASCII).
 
> A remaining problem is how to implement decoding error
> callbacks. In Python 2.1 encoding and decoding errors are
> handled in the same way with a string value. But with
> callbacks it doesn't make sense to use the same callback
for
> encoding and decoding (like codecs.StreamReaderWriter and
> codecs.StreamRecoder do). Decoding callbacks have a
> different API. Which arguments should be passed to the
> decoding callback, and what is the decoding callback
> supposed to do?

I'd suggest adding another set of PyCodec_UnicodeDecode...()
APIs for this. We'd then have to augment the base classes of
the StreamCodecs to provide two attributes for .errors with
a fallback solution for the string case (i.s. "strict" can
still be used for both directions).

> One additional note: It is vital that errors is an
> assignable attribute of the StreamWriter.

It is already !
 
> Consider the XML example: For writing an XML DOM tree one
> StreamWriter object is used. When a text node is written,
> the error handling has to be set to
> codecs.xmlreplace_encode_errors, but inside a comment or
> processing instruction replacing unencodable characters
with
> charrefs is not possible, so here
codecs.raise_encode_errors
> should be used (or better a custom error handler that
raises
> an error that says "sorry, you can't have unencodable
> characters inside a comment")

Sure.
 
> BTW, should we continue the discussion in the i18n SIG
> mailing list? An email program is much more comfortable
than
> a HTML textarea! ;)

I'd rather keep the discussions on this patch here --
forking it off to the i18n sig will make it very hard to
follow up on it. (This HTML area is indeed damn small ;-)
 

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 21:18

Message:
Logged In: YES 
user_id=89016

One additional note: It is vital that errors is an
assignable attribute of the StreamWriter. 

Consider the XML example: For writing an XML DOM tree one
StreamWriter object is used. When a text node is written,
the error handling has to be set to
codecs.xmlreplace_encode_errors, but inside a comment or
processing instruction replacing unencodable characters with
charrefs is not possible, so here codecs.raise_encode_errors
should be used (or better a custom error handler that raises
an error that says "sorry, you can't have unencodable
characters inside a comment")

BTW, should we continue the discussion in the i18n SIG
mailing list? An email program is much more comfortable than
a HTML textarea! ;)


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 20:59

Message:
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 20:00

Message:
Logged In: YES 
user_id=38388

About the Py_UNICODE*data, int size APIs:
Ok, point taken.

In general, I think we ought to keep the callback feature as
open as possible, so passing in pointers and sizes would not
be very useful.

BTW, could you summarize how the callback works in a few
lines ?

About _Encode121: I'd name this _EncodeUCS1 since that's
what it is ;-)

About the new functions: I was referring to the new static
functions which you gave PyUnicode_... names. If these are
not supposed to turn into non-static functions, I'd rather
have them use lower case names (since that's how the Python
internals work too -- most of the times).


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:56

Message:
Logged In: YES 
user_id=89016

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments
> --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

Another problem is, that the callback requires a Python
object, so in the PyObject *version, the refcount is
incref'd and the object is passed to the callback. The
Py_UNICODE*/int version would have to create a new Unicode
object from the data.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-06-12 18:32

Message:
Logged In: YES 
user_id=89016

> * please don't place more than one C statement on one line
> like in:
> """
> +               unicode = unicode2; unicodepos =
> unicode2pos;
> +               unicode2 = NULL; unicode2pos = 0;
> """

OK, done!

> * Comments should start with a capital letter and be
> prepended
> to the section they apply to

Fixed!

> * There should be spaces between arguments in compares
> (a == b) not (a==b)

Fixed!

> * Where does the name "...Encode121" originate ?

encode one-to-one, it implements both ASCII and latin-1
encoding.

> * module internal APIs should use lower case names (you
> converted some of these to  PyUnicode_...() -- this is
> normally reserved for APIs which are either marked as
> potential candidates for the public API or are very
> prominent in the code)

Which ones? I introduced a new function for every old one,
that had a "const char *errors" argument, and a few new ones
in codecs.h, of those PyCodec_EncodeHandlerForObject is
vital, because it is used to map for old string arguments to
the new function objects. PyCodec_RaiseEncodeErrors can be
used in the encoder implementation to raise an encode error,
but it could be made static in unicodeobject.h so only those
encoders implemented there have access to it.

> One thing which I don't like about your API change is that
> you removed the Py_UNICODE*data, int size style arguments > --
> this makes it impossible to use the new APIs on non-Python
> data or data which is not available as Unicode object.

I look through the code and found no situation where the
Py_UNICODE*/int version is really used and having two
(PyObject *)s (the original and the replacement string),
instead of UNICODE*/int and PyObject * made the
implementation a little easier, but I can fix that.

> Please separate the errors.c patch from this patch -- it
> seems totally unrelated to Unicode.

PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with
four hex digits. I removed it.

I'll upload a revised patch as soon as it's done.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-12 16:29

Message:
Logged In: YES 
user_id=38388

Thanks for the patch -- it looks very impressive !.

I'll give it a try later this week. 

Some first cosmetic tidbits:
* please don't place more than one C statement on one line
like in:
"""
+               unicode = unicode2; unicodepos =
unicode2pos;
+               unicode2 = NULL; unicode2pos = 0;
"""

* Comments should start with a capital letter and be
prepended
to the section they apply to

* There should be spaces between arguments in compares
(a == b) not (a==b)

* Where does the name "...Encode121" originate ?

* module internal APIs should use lower case names (you
converted some of these to  PyUnicode_...() -- this is
normally reserved for APIs which are either marked as
potential candidates for the public API or are very
prominent in the code)

One thing which I don't like about your API change is that
you removed the Py_UNICODE*data, int size style arguments --
this makes it impossible to use the new APIs on non-Python
data or data which is not available as Unicode object.

Please separate the errors.c patch from this patch -- it
seems totally unrelated to Unicode.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 17:27:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 09:27:58 -0800
Subject: [Patches] [ python-Patches-492105 ] Import from Zip archive
Message-ID: <E16lvUk-0002zx-00@usw-sf-web3.sourceforge.net>

Patches item #492105, was opened at 2001-12-12 17:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: James C. Ahlstrom (ahlstromjc)
Assigned to: Nobody/Anonymous (nobody)
Summary: Import from Zip archive

Initial Comment:
This is the "final" patch to support imports from zip 
archives, and directory caching using os.listdir(). It 
replaces patch 483466 and 476047.  It is a separate 
patch since I can't delete file attachments.  It adds 
support for importing from "" and from relative paths.

----------------------------------------------------------------------

>Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2002-03-15 17:27

Message:
Logged In: YES 
user_id=64929

I added a diff -c version of the patch.

----------------------------------------------------------------------

Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2002-03-15 17:03

Message:
Logged In: YES 
user_id=64929

I still can't delete files, but I added a new file which
contains all diffs as a single file, and is made from the
current CVS tree (Mar 15, 2002).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 17:43:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 09:43:05 -0800
Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped
Message-ID: <E16lvjN-0003B8-00@usw-sf-web3.sourceforge.net>

Patches item #530105, was opened at 2002-03-15 00:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470

Category: None
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Gustavo Niemeyer (niemeyer)
Assigned to: Nobody/Anonymous (nobody)
Summary: file object may not be subtyped

Initial Comment:
PyFileObject should be defined in fileobject.h, so it 
may be properly subtyped. This patches fixes this, 
and also a comment word typed twice.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 18:43

Message:
Logged In: YES 
user_id=21627

Applied as fileobject.c 2.147; fileobject.h 2.26.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-15 16:30

Message:
Logged In: YES 
user_id=6380

Looks good to me too. Check it in.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:45

Message:
Logged In: YES 
user_id=21627

This patch looks good to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470


From noreply@sourceforge.net  Fri Mar 15 19:41:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 11:41:13 -0800
Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores
Message-ID: <E16lxZh-0007Sw-00@usw-sf-web4.sourceforge.net>

Patches item #525532, was opened at 2002-03-04 15:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
>Assigned to: Martin v. Löwis (loewis)
Summary: Add support for POSIX semaphores

Initial Comment:
thread_pthread.h can be modified to use POSIX 
semaphores if available. This is more efficient than 
emulating them with mutexes and condition variables, 
and at least one platform that supports POSIX 
semaphores has a race condition in its condition 
variable support.

The new file would still be supporting POSIX threads, 
although from both <pthread.h> and <semaphore.h>, so 
perhaps ought to be renamed if this patch is accepted.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 09:54

Message:
Logged In: YES 
user_id=31435

Can someone on a pthreads platform please continue with 
this?  I'm +1 on it via eyeballing.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 00:01:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 16:01:57 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16m1e1-0006n4-00@usw-sf-web3.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 00:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Martin v. Löwis (loewis)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 00:54:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 16:54:42 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16m2T4-0007Dx-00@usw-sf-web3.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 01:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Martin v. Löwis (loewis)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 01:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 03:50:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 15 Mar 2002 19:50:54 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16m5Da-0007Ns-00@usw-sf-web1.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 00:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Martin v. Löwis (loewis)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-16 03:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 00:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 08:24:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 00:24:22 -0800
Subject: [Patches] [ python-Patches-504714 ] hasattr catches only AttributeError
Message-ID: <E16m9UE-00036W-00@usw-sf-web3.sourceforge.net>

Patches item #504714, was opened at 2002-01-17 03:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470

Category: Core (C code)
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Quinn Dunkan (quinn_dunkan)
Assigned to: Nobody/Anonymous (nobody)
Summary: hasattr catches only AttributeError

Initial Comment:
Curse me for a fool.  I reported this exact same thing
in getattr but failed to
look 30 lines down to notice hasattr.

hasattr(foo, 'bar') catches all exceptions.  I think it
should only catch 
AttributeError.  Example:

>>> class Foo:
...     def __getattr__(self, attr):
...         assert 0
... 
>>> f = Foo()
>>> hasattr(f, 'bar')
0               # should have gotten an AssertionError
>>>

This patch makes hasattr only catch AttributeError.  I
changed the 
docstring to reflect that, and also changed the getattr
docstring
to read a little more naturally.


----------------------------------------------------------------------

>Comment By: Just van Rossum (jvr)
Date: 2002-03-16 09:24

Message:
Logged In: YES 
user_id=92689

(The patch seems to be reversed.)
The patch otherwise looks fine to me, but it will break 
code that depends on the current behavior.
It can be argued that if getattr() raises *any* error, the 
attr doesn't exist, so the current behavior is in fact 
correct.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 08:55:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 00:55:23 -0800
Subject: [Patches] [ python-Patches-504714 ] hasattr catches only AttributeError
Message-ID: <E16m9yF-0005mc-00@usw-sf-web2.sourceforge.net>

Patches item #504714, was opened at 2002-01-17 02:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470

Category: Core (C code)
Group: Python 2.1.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Quinn Dunkan (quinn_dunkan)
Assigned to: Nobody/Anonymous (nobody)
Summary: hasattr catches only AttributeError

Initial Comment:
Curse me for a fool.  I reported this exact same thing
in getattr but failed to
look 30 lines down to notice hasattr.

hasattr(foo, 'bar') catches all exceptions.  I think it
should only catch 
AttributeError.  Example:

>>> class Foo:
...     def __getattr__(self, attr):
...         assert 0
... 
>>> f = Foo()
>>> hasattr(f, 'bar')
0               # should have gotten an AssertionError
>>>

This patch makes hasattr only catch AttributeError.  I
changed the 
docstring to reflect that, and also changed the getattr
docstring
to read a little more naturally.


----------------------------------------------------------------------

>Comment By: Quinn Dunkan (quinn_dunkan)
Date: 2002-03-16 08:55

Message:
Logged In: YES 
user_id=429749

That's true, but the current behavior can mask bugs
unexpectedly.  For example, if you ask someone if the
brakes are engaged, and they discover that the brakes have
crumbled to dust and fallen off, you probably want a
different answer than "no". :)

getattr() (now) only catches AttributeErrors, so there's 
a consistency thing too.

Anyway, it's your call :)

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-03-16 08:24

Message:
Logged In: YES 
user_id=92689

(The patch seems to be reversed.)
The patch otherwise looks fine to me, but it will break 
code that depends on the current behavior.
It can be argued that if getattr() raises *any* error, the 
attr doesn't exist, so the current behavior is in fact 
correct.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:38:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:38:08 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16mHC4-0008QI-00@usw-sf-web3.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 16:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 16:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 14:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 11:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 10:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 10:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 18:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 17:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:38:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:38:35 -0800
Subject: [Patches] [ python-Patches-518675 ] Adding galeon support
Message-ID: <E16mHCV-00033V-00@usw-sf-web4.sourceforge.net>

Patches item #518675, was opened at 2002-02-17 05:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470

Category: Library (Lib)
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Supreet Sethi (supreet)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adding galeon support 

Initial Comment:
It adds support galeon browser support in webbrowser lib.  

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:38

Message:
Logged In: YES 
user_id=6656

Feature --> not in 2.2.1

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-19 17:53

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:40:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:40:23 -0800
Subject: [Patches] [ python-Patches-525763 ] minor fix for regen on IRIX
Message-ID: <E16mHEF-0006z0-00@usw-sf-web1.sourceforge.net>

Patches item #525763, was opened at 2002-03-05 02:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Pruett (mpruett)
>Assigned to: Jack Jansen (jackjansen)
Summary: minor fix for regen on IRIX

Initial Comment:
The Lib/plat-irix6/regen script does not catch IRIX 6
(only IRIX 4 and 5), and it doesn't handle systems
which report themselves as running 'IRIX64' rather than
just 'IRIX'.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:40

Message:
Logged In: YES 
user_id=6656

Jack, can you look at this?  It looks fine to me, but I've
never even been near IRIX.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:40:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:40:54 -0800
Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes
Message-ID: <E16mHEk-0008S4-00@usw-sf-web3.sourceforge.net>

Patches item #525109, was opened at 2002-03-03 11:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470

Category: IDLE
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Liebmann (mliebmann)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Extension to Calltips / Show attributes

Initial Comment:
The attached files (unified diff files) implement a 
(quick and dirty but usefull) extension to IDLE 0.8 
(Python 2.2)

- Tested on WINDOWS 95/98/NT/2000 -

Similar to "CallTips" this extension shows (context 
sensitive) all available member functions and 
attributes of the current object after hitting 
the 'dot'-key.

The toplevel help widget now supports scrolling. (Key-
Up and Key-Down events)

...that is why I changed among else the first argument 
of 'showtip' from 'text string' to a 'list of text 
strings' ...

The 'space'-key is used to insert the topmost item of 
the help widget into an IDLE text window.

...the even handling seems to be a critical part of 
the current IDLE implementation. That is why I added 
the new functionallity as a patch of CallTips.py and 
CallTipWindow.py. May be you still have a better 
implementation ...

Greetings
Martin Liebmann

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:40

Message:
Logged In: YES 
user_id=6656

feature --> not in 2.2.x

----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-07 21:41

Message:
Logged In: YES 
user_id=475133

Patched and more robust version of the extended files 
CallTips.py and CallTipWindows.py. (Now more compatible to 
earlier versions of python)


----------------------------------------------------------------------

Comment By: Martin Liebmann (mliebmann)
Date: 2002-03-03 22:02

Message:
Logged In: YES 
user_id=475133

'<Key-.>' must be substituted by '.' within CallTip.py !
( Linux do not support an event named <Key-.> )

Running idle on Linux, I found the warning, that 'import *' 
is not allowed within function '_dir_main' of CallTip.py ???
Nevertheless CallTips works fine on Linux

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:42:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:42:06 -0800
Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files
Message-ID: <E16mHFu-0002Oh-00@usw-sf-web2.sourceforge.net>

Patches item #523944, was opened at 2002-02-28 17:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Mitch Chapman (mitchchapman)
Assigned to: Greg Stein (gstein)
>Summary: imputil.py can't import "\r\n" .py files

Initial Comment:
__builtin__.compile() requires that codestring line endings consist of
"\n".  imputil._compile() does not enforce this.  One result is that
imputil may be unable to import modules created on Win32.

The attached patch to the latest (CVS revision 1.23) imputil.py
replaces both "\r\n" and "\r" with "\n" before passing a code string
to __builtin__.compile().  This is consistent with the behavior of
e.g. Lib/py_compile.py.


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:42

Message:
Logged In: YES 
user_id=6656

Greg any chance of comments before 2.2.1c1, i.e. Monday?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-06 17:14

Message:
Logged In: YES 
user_id=38388

Assigning to Greg Stein -- imputil.py is his baby.

----------------------------------------------------------------------

Comment By: Mitch Chapman (mitchchapman)
Date: 2002-03-06 17:03

Message:
Logged In: YES 
user_id=348188

Please pardon if it's inappropriate to assign patches to project developers.
I'm doing so on the advice of a post by Skip Montanaro.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:43:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:43:50 -0800
Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching
Message-ID: <E16mHHa-00036n-00@usw-sf-web4.sourceforge.net>

Patches item #521478, was opened at 2002-02-22 14:54
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: Rejected
Priority: 5
Submitted By: Camiel Dobbelaar (camield)
Assigned to: Barry Warsaw (bwarsaw)
Summary: mailbox / fromline matching

Initial Comment:
mailbox.py does not parse this 'From' line correctly:
>From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:43

Message:
Logged In: YES 
user_id=6656

Anything going to happen here by Monday?

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-02 16:47

Message:
Logged In: YES 
user_id=12800

Re-opening and assigning to myself.  I'll take a look at
your patches asap.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-02 14:34

Message:
Logged In: YES 
user_id=466784

PortableUnixMailbox is not that useful, because it only
matches '^From '.  From-quoting is an even bigger mess then
From-headerlines, so that does not really help.

I submit a new diff that matches '\n\nFrom ' or
'<start-of-file>From ', which makes PortableUnixMailbox
useful for my purposes.  It is not that intrusive as the
comment in the mailbox.py suggests.


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-01 21:42

Message:
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.

----------------------------------------------------------------------

Comment By: Camiel Dobbelaar (camield)
Date: 2002-03-01 11:34

Message:
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 22:37

Message:
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:42:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:42:36 -0800
Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores
Message-ID: <E16mHGO-000369-00@usw-sf-web4.sourceforge.net>

Patches item #525532, was opened at 2002-03-04 14:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Martin v. Löwis (loewis)
Summary: Add support for POSIX semaphores

Initial Comment:
thread_pthread.h can be modified to use POSIX 
semaphores if available. This is more efficient than 
emulating them with mutexes and condition variables, 
and at least one platform that supports POSIX 
semaphores has a race condition in its condition 
variable support.

The new file would still be supporting POSIX threads, 
although from both <pthread.h> and <semaphore.h>, so 
perhaps ought to be renamed if this patch is accepted.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:42

Message:
Logged In: YES 
user_id=6656

Does this belong in the 2.2.x group?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 08:54

Message:
Logged In: YES 
user_id=31435

Can someone on a pthreads platform please continue with 
this?  I'm +1 on it via eyeballing.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 16:53:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 08:53:58 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mHRO-0002Wp-00@usw-sf-web2.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 12:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
>Assigned to: Tim Peters (tim_one)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 17:36:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 09:36:37 -0800
Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores
Message-ID: <E16mI6f-0007V5-00@usw-sf-web1.sourceforge.net>

Patches item #525532, was opened at 2002-03-04 09:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470

Category: Core (C code)
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Martin v. Löwis (loewis)
Summary: Add support for POSIX semaphores

Initial Comment:
thread_pthread.h can be modified to use POSIX 
semaphores if available. This is more efficient than 
emulating them with mutexes and condition variables, 
and at least one platform that supports POSIX 
semaphores has a race condition in its condition 
variable support.

The new file would still be supporting POSIX threads, 
although from both <pthread.h> and <semaphore.h>, so 
perhaps ought to be renamed if this patch is accepted.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-16 12:36

Message:
Logged In: YES 
user_id=31435

Changed Group to 2.3.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 11:42

Message:
Logged In: YES 
user_id=6656

Does this belong in the 2.2.x group?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 03:54

Message:
Logged In: YES 
user_id=31435

Can someone on a pthreads platform please continue with 
this?  I'm +1 on it via eyeballing.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470


From noreply@sourceforge.net  Sat Mar 16 17:38:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 16 Mar 2002 09:38:21 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mI8L-0002v1-00@usw-sf-web2.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 07:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
Assigned to: Tim Peters (tim_one)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-16 12:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 11:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 09:54:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 01:54:37 -0800
Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores
Message-ID: <E16mXN7-0002Bs-00@usw-sf-web3.sourceforge.net>

Patches item #525532, was opened at 2002-03-04 15:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Martin v. Löwis (loewis)
Summary: Add support for POSIX semaphores

Initial Comment:
thread_pthread.h can be modified to use POSIX 
semaphores if available. This is more efficient than 
emulating them with mutexes and condition variables, 
and at least one platform that supports POSIX 
semaphores has a race condition in its condition 
variable support.

The new file would still be supporting POSIX threads, 
although from both <pthread.h> and <semaphore.h>, so 
perhaps ought to be renamed if this patch is accepted.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 10:54

Message:
Logged In: YES 
user_id=21627

Thanks for the patch; committed as thread_pthread.h 2.39.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-16 18:36

Message:
Logged In: YES 
user_id=31435

Changed Group to 2.3.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:42

Message:
Logged In: YES 
user_id=6656

Does this belong in the 2.2.x group?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-15 09:54

Message:
Logged In: YES 
user_id=31435

Can someone on a pthreads platform please continue with 
this?  I'm +1 on it via eyeballing.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 10:12:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 02:12:43 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16mXed-0000wM-00@usw-sf-web1.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 01:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Martin v. Löwis (loewis)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 11:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-16 04:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 01:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 13:30:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 05:30:30 -0800
Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response
Message-ID: <E16mak2-0000tA-00@usw-sf-web4.sourceforge.net>

Patches item #517256, was opened at 2002-02-14 00:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: James Rucker (jamesrucker)
Assigned to: Fredrik Lundh (effbot)
Summary: poor performance in xmlrpc response

Initial Comment:
xmlrpclib.Transport.parse_response() (called from 
xmlrpclib.Transport.request()) is exhibiting poor 
performance - approx. 10x slower than expected.

I investigated based on using a simple app that sent a 
msg to a server, where all the server did was return 
the message back to the caller.  From profiling, it 
became clear that the return trip was taken 10x the 
time consumed by the client->server trip, and that the 
time was spent getting things across the wire.

parse_response() reads from a file object created via 
socket.makefile(), and as a result exhibits 
performance that is about an order of magnitude worse 
than what it would be if socket.recv() were used on 
the socket.  The patch provided uses socket.recv() 
when possible, to improve performance.

The patch provided is against revision 1.15.  Its use 
provides performance for the return trip that is more 
or less equivalent to that of the forward trip.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-03-17 14:30

Message:
Logged In: YES 
user_id=38376

James, what platform(s) did you use?

I'm not sure changing the parse_response() interface is
a good idea, but if this is a Windows-only problem, there
may be a slightly cleaner way to get the same end result.

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:14

Message:
Logged In: YES 
user_id=6380

My guess makefile() isn't buffering properly. This has been
a long-standing problem on Windows; I'm not sure if it's an
issue on Unix.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-01 15:34

Message:
Logged In: YES 
user_id=38376

looks fine to me.  I'll merge it with SLAB changes,
and will check it into the 2.3 codebase asap.

(we probably should try to figure out why makefile
causes a 10x slowdown too -- xmlrpclib isn't exactly
the only client library reading from a buffered
socket)

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 00:23

Message:
Logged In: YES 
user_id=6380

Fredrik, does this look OK to you?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 13:33:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 05:33:43 -0800
Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582
Message-ID: <E16man9-0000vG-00@usw-sf-web4.sourceforge.net>

Patches item #527371, was opened at 2002-03-08 14:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470

Category: None
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Fredrik Lundh (effbot)
Summary: Fix for sre bug 470582

Initial Comment:
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-08 19:28

Message:
Logged In: YES 
user_id=31435

Assigned to /F -- he's the expert here.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 16:23

Message:
Logged In: YES 
user_id=86307

I'm pretty sure the memset is correct; state->lastmark is 
the index of last mark written to (not the index of the 
next potential write).

Also, it occurred to me that there is another related error 
here:

>>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123')
>>> m.groups()
(None, None)
>>> m.lastindex
2

In other words, lastindex claims that group 2 was the last 
that matched, even though it didn't really match.  Since 
lastindex is undocumented, this probably doesn't matter too 
much.  Still, it probably should be reset if it is pointing 
to a group which gets "unmatched" when state->lastmark is 
reduced.  Perhaps a function like the following should be 
added for use in the three places where state->lastmark is 
reset to a previous value:

void lastmark_restore(SRE_STATE *state, int lastmark)
{
    assert(lastmark >= 0);
    if (state->lastmark > lastmark) {
        int lastvalidindex = 
            (lastmark == 0) ? -1 : (lastmark-1)/2+1;
        if (state->lastindex > lastvalidindex)
            state->lastindex = lastvalidindex;
        memset(
            state->mark + lastmark + 1, 0,
            (state->lastmark - lastmark) * sizeof(void*)
        );
    }
    state->lastmark = lastmark;
}
 

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-08 14:29

Message:
Logged In: YES 
user_id=33168

Confirmed that the test w/o fix fails
and the test passes with the fix to _sre.c.

But I'm not sure if the memset can go too far:

  memset(state->mark + lastmark + 1, 0, 
         (state->lastmark - lastmark) * sizeof(void*));

I can try under purify, but that doesn't guarantee anything.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-03-08 14:20

Message:
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 16:13:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 08:13:18 -0800
Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response
Message-ID: <E16mdHa-0002hF-00@usw-sf-web4.sourceforge.net>

Patches item #517256, was opened at 2002-02-13 15:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: James Rucker (jamesrucker)
Assigned to: Fredrik Lundh (effbot)
Summary: poor performance in xmlrpc response

Initial Comment:
xmlrpclib.Transport.parse_response() (called from 
xmlrpclib.Transport.request()) is exhibiting poor 
performance - approx. 10x slower than expected.

I investigated based on using a simple app that sent a 
msg to a server, where all the server did was return 
the message back to the caller.  From profiling, it 
became clear that the return trip was taken 10x the 
time consumed by the client->server trip, and that the 
time was spent getting things across the wire.

parse_response() reads from a file object created via 
socket.makefile(), and as a result exhibits 
performance that is about an order of magnitude worse 
than what it would be if socket.recv() were used on 
the socket.  The patch provided uses socket.recv() 
when possible, to improve performance.

The patch provided is against revision 1.15.  Its use 
provides performance for the return trip that is more 
or less equivalent to that of the forward trip.


----------------------------------------------------------------------

>Comment By: James Rucker (jamesrucker)
Date: 2002-03-17 08:13

Message:
Logged In: YES 
user_id=351540

The problem was discovered under FreeBSD 4.4.


----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-17 05:30

Message:
Logged In: YES 
user_id=38376

James, what platform(s) did you use?

I'm not sure changing the parse_response() interface is
a good idea, but if this is a Windows-only problem, there
may be a slightly cleaner way to get the same end result.

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 08:14

Message:
Logged In: YES 
user_id=6380

My guess makefile() isn't buffering properly. This has been
a long-standing problem on Windows; I'm not sure if it's an
issue on Unix.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-01 06:34

Message:
Logged In: YES 
user_id=38376

looks fine to me.  I'll merge it with SLAB changes,
and will check it into the 2.3 codebase asap.

(we probably should try to figure out why makefile
causes a 10x slowdown too -- xmlrpclib isn't exactly
the only client library reading from a buffered
socket)

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 15:23

Message:
Logged In: YES 
user_id=6380

Fredrik, does this look OK to you?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 17:10:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 09:10:23 -0800
Subject: [Patches] [ python-Patches-518675 ] Adding galeon support
Message-ID: <E16meAp-0003KG-00@usw-sf-web4.sourceforge.net>

Patches item #518675, was opened at 2002-02-17 06:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Supreet Sethi (supreet)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adding galeon support 

Initial Comment:
It adds support galeon browser support in webbrowser lib.  

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 18:10

Message:
Logged In: YES 
user_id=21627

Since the actual code isn't forthcoming, I'm rejecting the
patch.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

Feature --> not in 2.2.1

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-19 18:53

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 17:11:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 09:11:48 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16meCC-0000PC-00@usw-sf-web3.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 00:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
>Assigned to: Tim Peters (tim_one)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-17 17:11

Message:
Logged In: YES 
user_id=35752

I'm not sure exactly what Tim meant by that comment.  If we
want to make PyMalloc available to EXTENSION modules then,
yes, we need to remove the leading underscope and make a
wrapper for it.  I would prefer to keep it private for
now since it gives us more freedom on how PyMalloc_New
is implemented.  Tim?

Regarding the names, I have no problem with Py_Malloc.  If
we change should we keep PyMalloc_{New,NewVar,Del}?  Py_New
seems at little to short.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 10:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-16 03:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 00:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 18:22:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 10:22:46 -0800
Subject: [Patches] [ python-Patches-485959 ] Final set of patches to Demo/tix
Message-ID: <E16mfIs-00046x-00@usw-sf-web4.sourceforge.net>

Patches item #485959, was opened at 2001-11-27 12:16
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=485959&group_id=5470

Category: Tkinter
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Internet Discovery (idiscovery)
Assigned to: Martin v. Löwis (loewis)
Summary: Final set of patches to Demo/tix

Initial Comment:
Final set of patches to Demo/tix - this should be it for a while.

Tix.py: Fixed tix_configure and fixed some of the doc strings.

tixwidgets.py: fixed loop, added some more docstrings,
and made some progress on the global image1 problem.
Look for the code around 'if 0:' - it may point towards a
bug in Tkinter. Image class. Also if I can understand this
proble, maybe I can solve the long outstanding bug described
in Demo/tix/BUG.txt.

samples/ Fixed SHList1 and 2 not quiting when run from samples/
and fixed a bug in all of the demos that was leaving zombie pythonw
processes under Windows in 2.1.0


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 19:22

Message:
Logged In: YES 
user_id=21627

Thanks for the patches, applied as

tixwidgets.py 1.6
Balloon.py 1.2
BtnBox.py 1.2
CmpImg.py 1.2
ComboBox.py 1.2
Control.py 1.3
DirList.py 1.2
DirTree.py 1.2
NoteBook.py 1.2
OptMenu.py 1.2
PopMenu.py 1.2
SHList1.py 1.3
SHList2.py 1.3
Tree.py 1.2


----------------------------------------------------------------------

Comment By: Internet Discovery (idiscovery)
Date: 2001-12-10 01:01

Message:
Logged In: YES 
user_id=33229

Does the attached Tix.py patch run cleanly (it does for me).

The bugs in Tix.py and tixwidgets.py should be be fixed
before 2.2 goes final - the others are not so important.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-09 18:48

Message:
Logged In: YES 
user_id=21627

I don't think I can find the time to look at this once more
before 2.2 - it was already quite time-consuming the last
time. I would apply a patch in the last minute with little
inspection if it applies cleanly.

As for the nature of the problems: I believe for atleast one
of the files, the modified file was CRLF converted, so I
can't use the file you provided, either.

I recommend that you obtain a copy of cvs.exe for Windows,
so that you don't have to use a web browser to download the
files, and produce a diff.

----------------------------------------------------------------------

Comment By: Internet Discovery (idiscovery)
Date: 2001-12-09 07:52

Message:
Logged In: YES 
user_id=33229

Are the reject significant or just spurious linefeeds at the end of the file?

Please apply the patches of the files that run cleanly as the patches are
all independent and are all against the current files. The Tix.py and
tixiwdgets.py contain important bug fixes.

Let me know if the rejects are signiifcant; I don't always have access
to unix cvs, and the SF CVS download option under Windows adds
spurious linefeeds at the end. That's why I add the .dst files to
the tar so you can see if any rejects are trivial.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-02 13:35

Message:
Logged In: YES 
user_id=21627

These patches don't apply cleanly; I get patch rejects in 

Control.py.rej  SHList1.py.rej  SHList2.py.rej

Please obtain the current version through CVS, and produce a
'cvs diff -u', instead of individual diff files. We only
need the diffs; the original files aren't needed (so you
don't need to produce a tar file, either).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=485959&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 18:38:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 10:38:39 -0800
Subject: [Patches] [ python-Patches-430706 ] Persistent connections in BaseHTTPServer
Message-ID: <E16mfYF-0004Hz-00@usw-sf-web4.sourceforge.net>

Patches item #430706, was opened at 2001-06-06 17:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=430706&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Chris Lawrence (lordsutch)
Assigned to: Martin v. Löwis (loewis)
Summary: Persistent connections in BaseHTTPServer

Initial Comment:
This patch provides HTTP/1.1 persistent 
connection support in BaseHTTPServer.py.  It is 
not enabled by default (for backwards 
compatibility) because Content-Length headers 
must be supplied for persistent connections to 
work correctly.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 19:38

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Applied as aseHTTPServer.py 1.19,
SimpleHTTPServer.py 1.18, libbasehttp.tex 1.14, NEWS 1.364.

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2002-01-07 20:39

Message:
Logged In: YES 
user_id=6757

Here's my current version of the patch; the main change is that errors now result in closing the connection.  A cleaner approach for HTTP 1.1 would be to use Chunked Transfer Encoding for this, so the connection could remain available.

I still get spurious IOErrors (due to SIGPIPEs) that result from clients closing connections.  I believe this is because a lot of clients aren't well-behaved; i.e. they read the HTTP/1.1 response line then close the connection immediately.  Using TCP_CORK on Linux for sockets might help there, but it's not a general solution.  Also, I'm not really sure if these exceptions should be caught here or just left to subclasses to deal with...

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-01 21:21

Message:
Logged In: YES 
user_id=21627

Any chance that an updated patch is forthcoming?

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2001-09-22 00:01

Message:
Logged In: YES 
user_id=6757

I've tracked that one down and will have an updated patch in
a day or two... basically it just needs another else
condition to handle the empty readline().  There are also
some issues for subclasses that probably need to be
documented to play nicely with bad clients like wget that
claim to be HTTP 1.0 but do HTTP 1.1 stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 18:36

Message:
Logged In: YES 
user_id=21627

It still doesn't work right. If I access SimpleHTTPServer
from a Netscape client, I get error messages like

localhost - - [18/Sep/2001 18:32:22] code 400, message Bad
request syntax ('')
localhost - - [18/Sep/2001 18:32:22] "" 400 -

These are caused because the client closes the connection
after the first request (likely, after it finds out that the
document it got contains no references to the same server
anymore). However, the server continues to invoke
handle_one_request, which reads the empty line and fails to
recognize that the client has closed the connection.

----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2001-09-15 10:15

Message:
Logged In: YES 
user_id=6757

I reworked the patch a bit to ensure HTTP 1.1 mode is only 
used if the handler class is in HTTP 1.1 mode, and 
modified the test() functions in the server classes to add 
a "protocol" option.  I also modified SimpleHTTPServer to 
send Content-Length headers for the implemented classes.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-04 13:40

Message:
Logged In: YES 
user_id=21627

The patch in its current form seems to be broken. To see the
problem, please run SimpleHTTPServer on some directory, then
access it with a HTTP/1.1 client (e.g. Netscape 4.7).

The server will use the protocol version HTTP/1.0, but the
client will initially send 1.1, and send a Connection:
Keep-alive header. As a result, self.close_connection is set
to 0, despite using HTTP/1.0. In turn, the HTTP server won't
send a content length, and won't close the connection
either. Netscape waits forever from some completion which
never occurs, since the server waits for the next request on
the same connection.

It might be useful to enhance the SimpleHTTPServer test()
function to optionally operate in HTTP/1.1 mode (including
sending a proper ContentLength). Doing the same for the CGI
HTTP server is probably less useful.


----------------------------------------------------------------------

Comment By: Chris Lawrence (lordsutch)
Date: 2001-08-30 05:21

Message:
Logged In: YES 
user_id=6757

I have updated the patch against current CVS and have added
documentation.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-08 22:43

Message:
Logged In: YES 
user_id=21627

I haven't studied the patch in detail, yet, but I have a 
few comments on the style:

- there is no need to quote all authors of the RFC. Also,
  the reference to long-ago expired HTTP draft should go;
  just replace it with a single reference to the RFC
  number (giving an URL for the RFC might be convenient)
- Where is the documentation? A patch to 
Doc/lib/libbasehttp.tex would be appreciated. If you don't 
speak TeX, don't worry: Just write plain text, we'll do 
the mark-up.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=430706&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 19:32:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 11:32:46 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16mgOc-0004Ns-00@usw-sf-web2.sourceforge.net>

Patches item #530556, was opened at 2002-03-15 19:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
>Assigned to: Neil Schemenauer (nascheme)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-17 14:32

Message:
Logged In: YES 
user_id=31435

I certainly want, e.g., that our Unicode implementation can 
choose to use obmalloc.c for its raw string storage, 
despite that it isn't "object storage" (in the sense of 
Vladimir's level "+2" in the diagram at the top of 
obmalloc.c; the current CVS code restricts obmalloc use to 
level +2, while raw string storage is at level "+1").

Allowing to use pymalloc at level +1 changes Vladimir's 
original intent, and we have no experience with it, so I'm 
fine with restricting that ability to the core at the start.

About names, we've been calling this package "pymalloc" for 
years, and the general form of external name throughout 
Python is

    ["_"] "Py" Package "_" Function

_PyMalloc_{Malloc, Free, etc} fit that pattern perfectly.  
I don't see the attraction to giving functions from this 
package idiosyncratic names, and we've got so many ways to 
spell "get memory" that I expect it will be a genuine help 
to keep on making it clear, from the name alone, to 
which "family" a given variant of "new" (etc) belongs.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-17 12:11

Message:
Logged In: YES 
user_id=35752

I'm not sure exactly what Tim meant by that comment.  If we
want to make PyMalloc available to EXTENSION modules then,
yes, we need to remove the leading underscope and make a
wrapper for it.  I would prefer to keep it private for
now since it gives us more freedom on how PyMalloc_New
is implemented.  Tim?

Regarding the names, I have no problem with Py_Malloc.  If
we change should we keep PyMalloc_{New,NewVar,Del}?  Py_New
seems at little to short.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 05:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-15 22:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 19:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 19:42:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 11:42:26 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mgXy-0004UR-00@usw-sf-web2.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 07:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
>Assigned to: Nobody/Anonymous (nobody)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-17 14:42

Message:
Logged In: YES 
user_id=31435

Michael, this definitely doesn't belong in 2.2.1 as-is, 
because it removes a currently-exported name (buggy or not, 
sensible or not, somebody may be using random.stdgamma now 
and be happy with it).

John, if you're going to remove stdgamma, you need also to 
remove its (string) name from the module's __all__ list 
(right before the _verify() function).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-16 12:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 11:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 20:46:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 12:46:45 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mhYD-0005i7-00@usw-sf-web4.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 23:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: John Machin (sjmachin)
Date: 2002-03-18 07:46

Message:
Logged In: YES 
user_id=480138

OK; I understand the problems with the patch. Not sure 
about the way forward -- shall I prepare a patch that just 
fixes gammavariate() and leaves stdgamma() there (with 
warning in the comments: deprecated? will be removed in 
2.x?)? Do you want it real soon now (for 2.2.1)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 06:42

Message:
Logged In: YES 
user_id=31435

Michael, this definitely doesn't belong in 2.2.1 as-is, 
because it removes a currently-exported name (buggy or not, 
sensible or not, somebody may be using random.stdgamma now 
and be happy with it).

John, if you're going to remove stdgamma, you need also to 
remove its (string) name from the module's __all__ list 
(right before the _verify() function).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 04:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-17 03:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 21:47:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 13:47:20 -0800
Subject: [Patches] [ python-Patches-525763 ] minor fix for regen on IRIX
Message-ID: <E16miUq-0006P1-00@usw-sf-web4.sourceforge.net>

Patches item #525763, was opened at 2002-03-05 03:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Michael Pruett (mpruett)
Assigned to: Jack Jansen (jackjansen)
Summary: minor fix for regen on IRIX

Initial Comment:
The Lib/plat-irix6/regen script does not catch IRIX 6
(only IRIX 4 and 5), and it doesn't handle systems
which report themselves as running 'IRIX64' rather than
just 'IRIX'.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-03-17 22:47

Message:
Logged In: YES 
user_id=45365

Checked in as rev 1.3

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:40

Message:
Logged In: YES 
user_id=6656

Jack, can you look at this?  It looks fine to me, but I've
never even been near IRIX.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 21:51:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 13:51:34 -0800
Subject: [Patches] [ python-Patches-490100 ] Lets Tkinter work with MacOSX native Tk
Message-ID: <E16miYw-0006S8-00@usw-sf-web4.sourceforge.net>

Patches item #490100, was opened at 2001-12-07 03:44
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490100&group_id=5470

Category: Macintosh
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Jack Jansen (jackjansen)
Summary: Lets Tkinter work with MacOSX native Tk

Initial Comment:
There is a new Tcl/Tk in alpha that works on MacOSX's
windowing layer natively. This patch adds calls
necessary for Tkinter to work with it.

The Tcl/Tk alpha can be picked up here:

http://sourceforge.net/project/showfiles.php?group_id=10894

NOTE: The amount of extra code needed to interface with
Tcl/Tk will probably go down with the next alpha of Tcl/Tk.


----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-03-17 22:51

Message:
Logged In: YES 
user_id=45365

This patch was applied 3 months ago, and noone seems to be willing to write a readme. Still, people seem succesful in getting this to work, so let's forget about the readme and close the patch.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2001-12-10 00:17

Message:
Logged In: YES 
user_id=45365

The mods to _tkinter.c and tkappinit.c are in the repository. What still needs to be done is a readme file explaining where to obtain the X11 headers, what to put into Setup.local and how to run your Tkinter scripts.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2001-12-07 11:34

Message:
Logged In: YES 
user_id=45365

I assume the sprintf change was a mistake (I've undone it
after I applied the patch).
Aside from that the patch looks harmless to other platforms,
but I haven't gotten it to work yet. It fails compilation
with a missing X11/Xlib.h include. If I can get it to
compile at least once I'll put it in CVS before 2.2 (even
though it is only useful to the real die-hards: it requires
a Tk alfa, and only works under the experimental
framework-based Python.app).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-07 10:20

Message:
Logged In: YES 
user_id=21627

Please review your patches carefully before submitting them.
Why does this change PyOS_snprintf to sprintf?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490100&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 21:55:53 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 13:55:53 -0800
Subject: [Patches] [ python-Patches-496096 ] Mach-O MacPython IDE!
Message-ID: <E16mid7-0006Uy-00@usw-sf-web4.sourceforge.net>

Patches item #496096, was opened at 2001-12-22 14:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470

Category: Macintosh
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Donovan Preston (dsposx)
Assigned to: Jack Jansen (jackjansen)
Summary: Mach-O MacPython IDE!

Initial Comment:
Here it is... the moment we've all been waiting for... 
the MacPython IDE running in a bundle under Unix 
Python!

It's a beautiful thing. Most everything works 
flawlessly. One major point though... it's always 
asking you to convert UNIX line endings to mac 
line endings! Heh.

p.s. Jack: I took the quick route and assumed 
paths passed to FSSpec_New were slash-
delimited. It works at least, and the ability to specify 
the delimiter can be added later. I wanted to get 
this in CVS ASAP.

Donovan

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-03-17 22:55

Message:
Logged In: YES 
user_id=45365

I think this patch can be closed by now. Most of it was applied, and as the IDE seems to work in MachoPython I guess the bits that weren't applied were fixed in a different way.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-21 23:33

Message:
Logged In: YES 
user_id=45365

Donovan,
I can't apply your patches: something seems to have gone wrong with tabs and spaces.

I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already.

If you could later try to regenerate your patch for the other files that would be great!

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-21 22:52

Message:
Logged In: YES 
user_id=45365

Donovan,
I can't apply your patches: something seems to have gone wrong with tabs and spaces.

I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already.

If you could later try to regenerate your patch for the other files that would be great!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 21:56:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 13:56:14 -0800
Subject: [Patches] [ python-Patches-496096 ] Mach-O MacPython IDE!
Message-ID: <E16midS-0006VR-00@usw-sf-web4.sourceforge.net>

Patches item #496096, was opened at 2001-12-22 14:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470

Category: Macintosh
Group: Python 2.3
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Donovan Preston (dsposx)
Assigned to: Jack Jansen (jackjansen)
Summary: Mach-O MacPython IDE!

Initial Comment:
Here it is... the moment we've all been waiting for... 
the MacPython IDE running in a bundle under Unix 
Python!

It's a beautiful thing. Most everything works 
flawlessly. One major point though... it's always 
asking you to convert UNIX line endings to mac 
line endings! Heh.

p.s. Jack: I took the quick route and assumed 
paths passed to FSSpec_New were slash-
delimited. It works at least, and the ability to specify 
the delimiter can be added later. I wanted to get 
this in CVS ASAP.

Donovan

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-03-17 22:56

Message:
Logged In: YES 
user_id=45365

I think this patch can be closed by now. Most of it was applied, and as the IDE seems to work in MachoPython I guess the bits that weren't applied were fixed in a different way.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-03-17 22:55

Message:
Logged In: YES 
user_id=45365

I think this patch can be closed by now. Most of it was applied, and as the IDE seems to work in MachoPython I guess the bits that weren't applied were fixed in a different way.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-21 23:33

Message:
Logged In: YES 
user_id=45365

Donovan,
I can't apply your patches: something seems to have gone wrong with tabs and spaces.

I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already.

If you could later try to regenerate your patch for the other files that would be great!

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-21 22:52

Message:
Logged In: YES 
user_id=45365

Donovan,
I can't apply your patches: something seems to have gone wrong with tabs and spaces.

I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already.

If you could later try to regenerate your patch for the other files that would be great!

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470


From noreply@sourceforge.net  Sun Mar 17 22:11:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 14:11:39 -0800
Subject: [Patches] [ python-Patches-480902 ] allow dumbdbm to reuse space
Message-ID: <E16misN-0006gb-00@usw-sf-web4.sourceforge.net>

Patches item #480902, was opened at 2001-11-12 07:30
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Skip Montanaro (montanaro)
Summary: allow dumbdbm to reuse space

Initial Comment:
This patch to dumbdbm does two things:
   * allows it to reuse holes in the .dat file
   * provides a somewhat more complete test

The first change should be considered only for 2.3. 
Barry may or may not want to check out the test case
rewrite for incorporation into 2.2.  Accordingly, I've
assigned it to him.

Skip


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-14 19:45

Message:
Logged In: YES 
user_id=44345

Unless someone else has an objection, I'm going to close
this.  Barry already incorporated the expanded test case
and the space reuse is not really that important in my
mind since dumbdbm is generally only a fallback when no
other database is available.  If someone wants to use
a database bad enough, they will probably figure out a
way to use something more powerful.

Skip


----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2001-11-13 14:16

Message:
Logged In: YES 
user_id=12800

I've accepted the second half -- the improvement to the test
suite -- but as recommended, I'm postponing the first half
until Py 2.3.  Assigning back to Skip so he'll remember to
deal with this again later.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 05:32:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 21:32:50 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mplK-0008SM-00@usw-sf-web3.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 07:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
>Assigned to: Tim Peters (tim_one)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-18 00:32

Message:
Logged In: YES 
user_id=31435

John, if I were you <wink> I'd leave stdgamma alone, except 
for adding this code to its start:

import warnings
warnings.warn("The stdgamma function is deprecated; "
              "use gammavariate() instead",
              DeprecationWarning)

Then we can remove stdgamma in 2.4.  2.2.1 will probably go 
out on Monday night, so it would be nice to get this done 
before then.  OTOH, I expect there will be a 2.2.2 later, 
so not a tragedy if it's not.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-17 15:46

Message:
Logged In: YES 
user_id=480138

OK; I understand the problems with the patch. Not sure 
about the way forward -- shall I prepare a patch that just 
fixes gammavariate() and leaves stdgamma() there (with 
warning in the comments: deprecated? will be removed in 
2.x?)? Do you want it real soon now (for 2.2.1)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 14:42

Message:
Logged In: YES 
user_id=31435

Michael, this definitely doesn't belong in 2.2.1 as-is, 
because it removes a currently-exported name (buggy or not, 
sensible or not, somebody may be using random.stdgamma now 
and be happy with it).

John, if you're going to remove stdgamma, you need also to 
remove its (string) name from the module's __all__ list 
(right before the _verify() function).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-16 12:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 11:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 07:07:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 17 Mar 2002 23:07:14 -0800
Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files
Message-ID: <E16mrEg-00081k-00@usw-sf-web1.sourceforge.net>

Patches item #523944, was opened at 2002-02-28 09:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Mitch Chapman (mitchchapman)
Assigned to: Greg Stein (gstein)
>Summary: imputil.py can't import "\r\n" .py files

Initial Comment:
__builtin__.compile() requires that codestring line endings consist of
"\n".  imputil._compile() does not enforce this.  One result is that
imputil may be unable to import modules created on Win32.

The attached patch to the latest (CVS revision 1.23) imputil.py
replaces both "\r\n" and "\r" with "\n" before passing a code string
to __builtin__.compile().  This is consistent with the behavior of
e.g. Lib/py_compile.py.


----------------------------------------------------------------------

>Comment By: Greg Stein (gstein)
Date: 2002-03-17 23:07

Message:
Logged In: YES 
user_id=6501

I've been out this weekend, so no... won't make it by Monday
the 18th.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 08:42

Message:
Logged In: YES 
user_id=6656

Greg any chance of comments before 2.2.1c1, i.e. Monday?

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-06 09:14

Message:
Logged In: YES 
user_id=38388

Assigning to Greg Stein -- imputil.py is his baby.

----------------------------------------------------------------------

Comment By: Mitch Chapman (mitchchapman)
Date: 2002-03-06 09:03

Message:
Logged In: YES 
user_id=348188

Please pardon if it's inappropriate to assign patches to project developers.
I'm doing so on the advice of a post by Skip Montanaro.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 08:37:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 00:37:45 -0800
Subject: [Patches] [ python-Patches-525870 ] urllib2: duplicate call, stat attrs
Message-ID: <E16mseH-0004iS-00@usw-sf-web4.sourceforge.net>

Patches item #525870, was opened at 2002-03-05 09:58
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2: duplicate call, stat attrs

Initial Comment:
This patch removes a duplicate call to os.stat in 
urllib2.FileHandler.open_local_file()

In addition to that, it uses the new stat attributes, 
so importing stat is no longer neccessary.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:37

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as urllib2.py 1.26.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 08:42:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 00:42:52 -0800
Subject: [Patches] [ python-Patches-523424 ] Finding "home" in "user.py" for Windows
Message-ID: <E16msjE-0004Ok-00@usw-sf-web2.sourceforge.net>

Patches item #523424, was opened at 2002-02-27 16:03
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523424&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gilles Lenfant (glenfant)
Assigned to: Nobody/Anonymous (nobody)
>Summary: Finding "home" in "user.py" for Windows

Initial Comment:
On my win2k French box + python 2.1.2:

>>> import user
>>> user.home
'C:\'

This isn't a great issue but this means that all users 
of this win2k box will share the same ".pythonrc.py".

The code provided by Jeff Bauer can be changed easily 
because the standard Python distro now has a "_winreg" 
module.

This patch gives the real user $HOME like folder for 
any user on whatever's Windows localization:

>>> import user
>>> user.home
u'C:\Documents and Settings\MyWindowsUsername\Mes 
documents'

This has been successfully tested with Win98 and 
Win2000. This should be tested on XP, NT4, and 95 but 
I can't.

Sorry for the "context or unified diffs" (dunno what 
it means) but the module is short and my patch is 
clearly emphasized.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:42

Message:
Logged In: YES 
user_id=21627

If there are no further comments in favour of accepting this
patch, it will be rejected.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-27 23:13

Message:
Logged In: YES 
user_id=21627

If it returns "My Documents", it is definitely *not* the
home directory of the user; \Documents and Settings\username
would be the home directory.

Furthermore, on many installations, HOME *is* set, and it is
the Administrator's choice where that points to; the typical
installation (in a domain) indeed is to assign HOMEDRIVE.

So I'm not in favour of that change.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523424&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 08:48:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 00:48:00 -0800
Subject: [Patches] [ python-Patches-514628 ] bug in pydoc on python 2.2 release
Message-ID: <E16msoC-0004Sh-00@usw-sf-web2.sourceforge.net>

Patches item #514628, was opened at 2002-02-08 03:09
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Raj Kunjithapadam (mmaster25)
Assigned to: Tim Peters (tim_one)
Summary: bug in pydoc on python 2.2 release

Initial Comment:
pydoc has a bug when trying to generate html doc
more importantly it has bug in the method
writedoc()

attached is my fix.
Here is the diff between my fix and the regular dist

1338c1338
< def writedoc(thing, forceload=0):
---
> def writedoc(key, forceload=0):
1340,1346c1340,1343
<     object = thing
<     if type(thing) is type(''):
<         try:
<             object = locate(thing, forceload)
<         except ErrorDuringImport, value:
<             print value
<             return
---
>     try:
>         object = locate(key, forceload)
>     except ErrorDuringImport, value:
>         print value
1351c1348
<             file = open(thing.__name__ + '.html', 'w')
---
>             file = open(key + '.html', 'w')
1354c1351
<             print 'wrote', thing.__name__ + '.html'
---
>             print 'wrote', key + '.html'
1356c1353
<             print 'no Python documentation found for
%s' % repr(thing)
---
>             print 'no Python documentation found for
%s' % repr(key)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:48

Message:
Logged In: YES 
user_id=21627

Can you please provide an example that demonstrates the problem?

Also, can you please regenerate your changes as context (-c)
or unified (-u) diffs, and attach those to this report (do
*not* paste them into the comment field)? In their current,
the patch is pretty useless: SF messed up the indentation,
and it is an old-style patch, and pydoc.py is already at 1.58.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 23:45

Message:
Logged In: YES 
user_id=6380

assigned to Tim; this may be Ping's terrain but Ping is
typically not responsive.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 08:48:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 00:48:55 -0800
Subject: [Patches] [ python-Patches-513329 ] build, install in HP-UX10.20
Message-ID: <E16msp5-0000fs-00@usw-sf-web1.sourceforge.net>

Patches item #513329, was opened at 2002-02-05 16:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=513329&group_id=5470

Category: Build
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Claudio Scafuri (scafuri)
Assigned to: Nobody/Anonymous (nobody)
Summary: build, install in HP-UX10.20

Initial Comment:
a) python must be linked with c++ because at least one
file is compiled with c++.

b) in hpux  "install -d" does not create a directory.
Use "mkdir instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-24 17:07

Message:
Logged In: YES 
user_id=21627

If there isn't any further feedback by March 1, this report
will be closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-13 02:25

Message:
Logged In: YES 
user_id=21627

There is already code in configure[.in] that tests whether
using c++ to link is needed (it isn't needed on all
systems). Please report why this test fails on HP-UX, and
try providing a patch that corrects the test. The relevant
code is after the comment

# If CXX is set, and if it is needed to link a main function
that was
# compiled with CXX, LINKCC is CXX instead.

Also, please contribute changes as unified or context diffs;
see the Python SourceForge usage guidelines for details.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=513329&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 08:57:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 00:57:16 -0800
Subject: [Patches] [ python-Patches-512466 ] Script to move faqwiz entries.
Message-ID: <E16msxA-0000m9-00@usw-sf-web1.sourceforge.net>

Patches item #512466, was opened at 2002-02-03 21:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512466&group_id=5470

Category: Demos and tools
Group: Python 2.1.2
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Christian Reis (kiko_async)
Assigned to: Nobody/Anonymous (nobody)
Summary: Script to move faqwiz entries.

Initial Comment:
Moves entries from one section (number actually) to
another. Doesn't do anything smart like renumber
questions, but at least it doesn't clobber them.

Usage:

blackjesus:~> ./move-faqwiz.sh 2\.1 3\.2
Moving FAQ question 02.001 to 03.002


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:57

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Added as move-faqwiz.sh 1.1, README 1.13.

----------------------------------------------------------------------

Comment By: Christian Reis (kiko_async)
Date: 2002-02-13 03:23

Message:
Logged In: YES 
user_id=222305

Added file (duh). 

And of course you can: use Bugzilla :-) it's free software.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-13 02:30

Message:
Logged In: YES 
user_id=21627

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512466&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 08:59:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 00:59:27 -0800
Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals()
Message-ID: <E16mszH-0004b7-00@usw-sf-web2.sourceforge.net>

Patches item #511219, was opened at 2002-01-31 15:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Douady (douady)
Assigned to: Nobody/Anonymous (nobody)
Summary: suppress type restrictions on locals()

Initial Comment:
This patch suppresses the restriction that global and
local dictionaries do not access overloaded __getitem__
and __setitem__ if passed an object derived from class
dict.

An exception is made for the builtin insertion and
reference in the global dict to make sure this object
exists and to suppress the need for the derived class
to take care of this implementation dependent detail.

The behavior of eval and exec has been updated for code
objects which have the CO_NEWLOCALS flag set : if
explicitely passed a local dict, a new local dict is
not generated. This allows one to pass an explicit
local dict to the code object of a function (which
otherwise cannot be achieved). If this cannot be done
for backward compatibility problems, then an
alternative would consist in using the "new" module to
create a code object from a function with CO_NEWLOCALS
reset but it seems logical to me to use the information
explicitely provided.

Free and cell variables are not managed in this
version. If the patch is accepted, I am willing to
finish the job and implement free and cell variables,
but this requires a serious rework of the Cell object:
free variables should be accessed using the method of
the dict in which they relies and today, this dict is
not accessible from the Cell object.

Robustness : Currently, the plain test suite passes
(with a modification of test_desctut which precisely
verifies that the suppressed restriction is enforced).
 I have introduced a new test (test_subdict.py) which
verifies the new behavior.

Because of performance, the plain case (when the local
dict is a plain dict) is optimized so that differences
in performance are not measurable (within 1%) when run
on the test suite (i.e. I timed make test).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:59

Message:
Logged In: YES 
user_id=21627

This is quite a complex change. If you want to see it
integrated, I recommend that you find people that try it out
and report their experience here.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 10:43:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 02:43:45 -0800
Subject: [Patches] [ python-Patches-499513 ] robotparser.py fails on some URLs
Message-ID: <E16mucD-0001zK-00@usw-sf-web1.sourceforge.net>

Patches item #499513, was opened at 2002-01-04 19:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Bastian Kleineidam (calvin)
Assigned to: Guido van Rossum (gvanrossum)
Summary: robotparser.py fails on some URLs

Initial Comment:
I am using Python 2.1.1.

The URL http://www.chaosreigns.com/robots.txt results
in an empty RobotParser object.
Reason is that the file object returned from the
URLOpener does not have a readlines() attribute.

I patched the robotparser.py to use readline() instead
of readlines().
Furthermore I removed the unnecessary redirection limit
code which is already in FancyURLopener.


Greetings, Bastian


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 11:43

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as robotparser.py 1.12.

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-01-04 20:02

Message:
Logged In: YES 
user_id=9205

Updated patch with copyright

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-04 19:49

Message:
Logged In: YES 
user_id=6380

I'll gladly apply your patch.

Would you mind to also supply a patch for the copyright
statement?
It says "Python 2.0 open source license" but that's no
longer the current license. How about the PSF license
agreement for Python 2.2?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 12:45:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 04:45:12 -0800
Subject: [Patches] [ python-Patches-495598 ] add an -q (quiet) option to pycompile
Message-ID: <E16mwVk-0007ap-00@usw-sf-web4.sourceforge.net>

Patches item #495598, was opened at 2001-12-20 21:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=495598&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Nobody/Anonymous (nobody)
Summary: add an -q (quiet) option to pycompile

Initial Comment:
this patch is applied to Debian's python packages for 
more than two years allowing quiet batch compilations.

--- python2.2-2.2.orig/Lib/compileall.py	Wed 
Apr 18 03:20:21 2001
+++ python2.2-2.2/Lib/compileall.py	Sun Sep 30 
22:30:32 2001
@@ -4,6 +4,8 @@
 given as arguments recursively; the -l option 
prevents it from
 recursing into directories.
 
+DEBIAN adds an -q option for more quiet operation.
+
 Without arguments, if compiles all modules on 
sys.path, without
 recursing into subdirectories.  (Even though it 
should do so for
 packages -- for now, you'll have to deal with 
packages separately.)
@@ -19,7 +21,7 @@
 
 __all__ = ["compile_dir","compile_path"]
 
-def compile_dir(dir, maxlevels=10, ddir=None, 
force=0, rx=None):
+def compile_dir(dir, maxlevels=10, ddir=None, 
force=0, rx=None, quiet=0):
     """Byte-compile all modules in the given 
directory tree.
 
     Arguments (only dir is required):
@@ -29,9 +31,10 @@
     ddir:      if given, purported directory name 
(this is the
                directory name that will show up in 
error messages)
     force:     if 1, force compilation, even if 
timestamps are up-to-date
+    quiet:     if 1, be quiet during compilation
 
     """
-    print 'Listing', dir, '...'
+    if not quiet: print 'Listing', dir, '...'
     try:
         names = os.listdir(dir)
     except os.error:
@@ -57,7 +60,7 @@
                 try: ctime = os.stat(cfile)
[stat.ST_MTIME]
                 except os.error: ctime = 0
                 if (ctime > ftime) and not force: 
continue
-                print 'Compiling', fullname, '...'
+                if not quiet: print 'Compiling', 
fullname, '...'
                 try:
                     ok = py_compile.compile(fullname, 
None, dfile)
                 except KeyboardInterrupt:
@@ -77,11 +80,11 @@
              name != os.curdir and name != os.pardir 
and \
              os.path.isdir(fullname) and \
              not os.path.islink(fullname):
-            if not compile_dir(fullname, maxlevels - 
1, dfile, force, rx):
+            if not compile_dir(fullname, maxlevels - 
1, dfile, force, rx, quiet):
                 success = 0
     return success
 
-def compile_path(skip_curdir=1, maxlevels=0, force=0):
+def compile_path(skip_curdir=1, maxlevels=0, force=0, 
quiet=0):
     """Byte-compile all module on sys.path.
 
     Arguments (all optional):
@@ -89,6 +92,7 @@
     skip_curdir: if true, skip current directory 
(default true)
     maxlevels:   max recursion level (default 0)
     force: as for compile_dir() (default 0)
+    quiet: as for compile_dir() (default 0)
 
     """
     success = 1
@@ -96,20 +100,21 @@
         if (not dir or dir == os.curdir) and 
skip_curdir:
             print 'Skipping current directory'
         else:
-            success = success and compile_dir(dir, 
maxlevels, None, force)
+            success = success and compile_dir(dir, 
maxlevels, None, force, quiet)
     return success
 
 def main():
     """Script main program."""
     import getopt
     try:
-        opts, args = getopt.getopt(sys.argv
[1:], 'lfd:x:')
+        opts, args = getopt.getopt(sys.argv
[1:], 'lfqd:x:')
     except getopt.error, msg:
         print msg
-        print "usage: python compileall.py [-l] [-f] 
[-d destdir] " \
+        print "usage: python compileall.py [-l] [-f] 
[-q] [-d destdir] " \
               "[-s regexp] [directory ...]"
         print "-l: don't recurse down"
         print "-f: force rebuild even if timestamps 
are up-to-date"
+        print "-q: quiet operation"
         print "-d destdir: purported directory name 
for error messages"
         print "   if no directory arguments, -l 
sys.path is assumed"
         print "-x regexp: skip files matching the 
regular expression regexp"
@@ -118,11 +123,13 @@
     maxlevels = 10
     ddir = None
     force = 0
+    quiet = 0
     rx = None
     for o, a in opts:
         if o == '-l': maxlevels = 0
         if o == '-d': ddir = a
         if o == '-f': force = 1
+        if o == '-q': quiet = 1
         if o == '-x':
             import re
             rx = re.compile(a)
@@ -134,7 +141,7 @@
     try:
         if args:
             for dir in args:
-                if not compile_dir(dir, maxlevels, 
ddir, force, rx):
+                if not compile_dir(dir, maxlevels, 
ddir, force, rx, quiet):
                     success = 0
         else:
             success = compile_path()


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 13:45

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as NEWS 1.365, compileall.py
1.10, libcompileall.tex 1.3.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=495598&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 12:53:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 04:53:58 -0800
Subject: [Patches] [ python-Patches-458534 ] ncurses form module
Message-ID: <E16mweE-0003aW-00@usw-sf-web1.sourceforge.net>

Patches item #458534, was opened at 2001-09-04 23:23
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=458534&group_id=5470

Category: Modules
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
Assigned to: A.M. Kuchling (akuchling)
Summary: ncurses form module

Initial Comment:
>From an e-mail sent to me privately:

hello.
 
i written extension for curses module
this is not 100% jet
 
Lambach Bartosz
lda@lupa.pl
 
ps. sorry, english is not my favorit ;)


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 13:53

Message:
Logged In: YES 
user_id=21627

Rejecting the patch.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-05 16:16

Message:
Logged In: YES 
user_id=21627

In the current form, the module seems to be unacceptable. It
comes with no documentation, and no examples.

I'd strongly encourage the author to provide a sample
application. If he's willing to write some documentation,
that would be also good. If he can write only Polish, I
could help finding somebody who translates that into English
afterwards.

Note that there is also a complete interface to forms, and
the rest of ncurses, in

http://pyncurses.sourceforge.net/


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=458534&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 12:57:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 04:57:52 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mwi0-0004ty-00@usw-sf-web3.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 23:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
Assigned to: Tim Peters (tim_one)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: John Machin (sjmachin)
Date: 2002-03-18 23:57

Message:
Logged In: YES 
user_id=480138

Patch file random2.dif uploaded.
stdgamma() deprecated as per TP suggestion.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 16:32

Message:
Logged In: YES 
user_id=31435

John, if I were you <wink> I'd leave stdgamma alone, except 
for adding this code to its start:

import warnings
warnings.warn("The stdgamma function is deprecated; "
              "use gammavariate() instead",
              DeprecationWarning)

Then we can remove stdgamma in 2.4.  2.2.1 will probably go 
out on Monday night, so it would be nice to get this done 
before then.  OTOH, I expect there will be a 2.2.2 later, 
so not a tragedy if it's not.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-18 07:46

Message:
Logged In: YES 
user_id=480138

OK; I understand the problems with the patch. Not sure 
about the way forward -- shall I prepare a patch that just 
fixes gammavariate() and leaves stdgamma() there (with 
warning in the comments: deprecated? will be removed in 
2.x?)? Do you want it real soon now (for 2.2.1)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 06:42

Message:
Logged In: YES 
user_id=31435

Michael, this definitely doesn't belong in 2.2.1 as-is, 
because it removes a currently-exported name (buggy or not, 
sensible or not, somebody may be using random.stdgamma now 
and be happy with it).

John, if you're going to remove stdgamma, you need also to 
remove its (string) name from the module's __all__ list 
(right before the _verify() function).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 04:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-17 03:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 13:05:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 05:05:07 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mwp1-0004zM-00@usw-sf-web3.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 23:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
Assigned to: Tim Peters (tim_one)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: John Machin (sjmachin)
Date: 2002-03-19 00:05

Message:
Logged In: YES 
user_id=480138

Attached is test script test_gamma.py. Passing test means: 
eye-balling of relative "errors" reveals no nasties for at 
least alpha >= 0.1

Note that Python's gammavariate() is not very accurate at 
all for alpha < 0.1 approx. However neither are another two 
methods that I tried (details in the file). I'll leave it 
at that -- evidently alpha < 1.0 is "rare and difficult" 
according to Marsaglia & Tsang.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-18 23:57

Message:
Logged In: YES 
user_id=480138

Patch file random2.dif uploaded.
stdgamma() deprecated as per TP suggestion.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 16:32

Message:
Logged In: YES 
user_id=31435

John, if I were you <wink> I'd leave stdgamma alone, except 
for adding this code to its start:

import warnings
warnings.warn("The stdgamma function is deprecated; "
              "use gammavariate() instead",
              DeprecationWarning)

Then we can remove stdgamma in 2.4.  2.2.1 will probably go 
out on Monday night, so it would be nice to get this done 
before then.  OTOH, I expect there will be a 2.2.2 later, 
so not a tragedy if it's not.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-18 07:46

Message:
Logged In: YES 
user_id=480138

OK; I understand the problems with the patch. Not sure 
about the way forward -- shall I prepare a patch that just 
fixes gammavariate() and leaves stdgamma() there (with 
warning in the comments: deprecated? will be removed in 
2.x?)? Do you want it real soon now (for 2.2.1)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 06:42

Message:
Logged In: YES 
user_id=31435

Michael, this definitely doesn't belong in 2.2.1 as-is, 
because it removes a currently-exported name (buggy or not, 
sensible or not, somebody may be using random.stdgamma now 
and be happy with it).

John, if you're going to remove stdgamma, you need also to 
remove its (string) name from the module's __all__ list 
(right before the _verify() function).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 04:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-17 03:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 13:08:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 05:08:25 -0800
Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139
Message-ID: <E16mwsD-0007rA-00@usw-sf-web4.sourceforge.net>

Patches item #529408, was opened at 2002-03-13 12:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: John Machin (sjmachin)
Assigned to: Tim Peters (tim_one)
Summary: fix random.gammavariate bug #527139

Initial Comment:
random.gammavariate() doesn't work for gamma < 0.5

See detailed comment on bug # 527139

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-18 13:08

Message:
Logged In: YES 
user_id=6656

I'm afraid this isn't going to make 2.2.1c1.

I'll try to consider it before 2.2.1 final, but I'd want to
be very certain about things before applying it there.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-18 13:05

Message:
Logged In: YES 
user_id=480138

Attached is test script test_gamma.py. Passing test means: 
eye-balling of relative "errors" reveals no nasties for at 
least alpha >= 0.1

Note that Python's gammavariate() is not very accurate at 
all for alpha < 0.1 approx. However neither are another two 
methods that I tried (details in the file). I'll leave it 
at that -- evidently alpha < 1.0 is "rare and difficult" 
according to Marsaglia & Tsang.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-18 12:57

Message:
Logged In: YES 
user_id=480138

Patch file random2.dif uploaded.
stdgamma() deprecated as per TP suggestion.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 05:32

Message:
Logged In: YES 
user_id=31435

John, if I were you <wink> I'd leave stdgamma alone, except 
for adding this code to its start:

import warnings
warnings.warn("The stdgamma function is deprecated; "
              "use gammavariate() instead",
              DeprecationWarning)

Then we can remove stdgamma in 2.4.  2.2.1 will probably go 
out on Monday night, so it would be nice to get this done 
before then.  OTOH, I expect there will be a 2.2.2 later, 
so not a tragedy if it's not.

----------------------------------------------------------------------

Comment By: John Machin (sjmachin)
Date: 2002-03-17 20:46

Message:
Logged In: YES 
user_id=480138

OK; I understand the problems with the patch. Not sure 
about the way forward -- shall I prepare a patch that just 
fixes gammavariate() and leaves stdgamma() there (with 
warning in the comments: deprecated? will be removed in 
2.x?)? Do you want it real soon now (for 2.2.1)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 19:42

Message:
Logged In: YES 
user_id=31435

Michael, this definitely doesn't belong in 2.2.1 as-is, 
because it removes a currently-exported name (buggy or not, 
sensible or not, somebody may be using random.stdgamma now 
and be happy with it).

John, if you're going to remove stdgamma, you need also to 
remove its (string) name from the module's __all__ list 
(right before the _verify() function).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=31435

Possibly, depending on whether it belongs in 2.3 -- I'm 
spread too thin to review it now.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:53

Message:
Logged In: YES 
user_id=6656

Tim, do you think this should go into 2.2.1?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 13:45:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 05:45:21 -0800
Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance
Message-ID: <E16mxRx-0007yE-00@usw-sf-web2.sourceforge.net>

Patches item #504943, was opened at 2002-01-17 11:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: call warnings.warn with Warning instance

Initial Comment:
This patch makes it possible to pass Warning instances 
as the first argument to warnings.warn. In this case 
the category argument will be ignored. The message
text used will be str(warninginstance). This makes it 
possible to implement special logic in a custom 
Warning class by implemening the __str__ method.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 08:45

Message:
Logged In: YES 
user_id=6380

Nice idea. Where's the documentation patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 13:58:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 05:58:09 -0800
Subject: [Patches] [ python-Patches-499513 ] robotparser.py fails on some URLs
Message-ID: <E16mxeL-0005dY-00@usw-sf-web3.sourceforge.net>

Patches item #499513, was opened at 2002-01-04 13:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Bastian Kleineidam (calvin)
>Assigned to: Martin v. Löwis (loewis)
Summary: robotparser.py fails on some URLs

Initial Comment:
I am using Python 2.1.1.

The URL http://www.chaosreigns.com/robots.txt results
in an empty RobotParser object.
Reason is that the file object returned from the
URLOpener does not have a readlines() attribute.

I patched the robotparser.py to use readline() instead
of readlines().
Furthermore I removed the unnecessary redirection limit
code which is already in FancyURLopener.


Greetings, Bastian


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 05:43

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as robotparser.py 1.12.

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-01-04 14:02

Message:
Logged In: YES 
user_id=9205

Updated patch with copyright

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-04 13:49

Message:
Logged In: YES 
user_id=6380

I'll gladly apply your patch.

Would you mind to also supply a patch for the copyright
statement?
It says "Python 2.0 open source license" but that's no
longer the current license. How about the PSF license
agreement for Python 2.2?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 15:01:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 07:01:43 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16mydr-0005Aj-00@usw-sf-web1.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 15:46:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 07:46:42 -0800
Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance
Message-ID: <E16mzLO-0001LM-00@usw-sf-web4.sourceforge.net>

Patches item #504943, was opened at 2002-01-17 17:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: call warnings.warn with Warning instance

Initial Comment:
This patch makes it possible to pass Warning instances 
as the first argument to warnings.warn. In this case 
the category argument will be ignored. The message
text used will be str(warninginstance). This makes it 
possible to implement special logic in a custom 
Warning class by implemening the __str__ method.


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-18 16:46

Message:
Logged In: YES 
user_id=89016

The new version includes a patch to the documentation and 
an entry in Misc/NEWS

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 14:45

Message:
Logged In: YES 
user_id=6380

Nice idea. Where's the documentation patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 18:38:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 10:38:10 -0800
Subject: [Patches] [ python-Patches-531480 ] Use new GC API (generators, iters, ...)
Message-ID: <E16n21K-0002sA-00@usw-sf-web2.sourceforge.net>

Patches item #531480, was opened at 2002-03-18 18:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Use new GC API (generators, iters, ...)

Initial Comment:
I just noticed that iterators, generators and method
objects are still using the old GC API.  I thought
I fixed these.  Is it possible that branch merging
backed out the changes?  Maybe my memory is bad.

Anyhow, this patch restores GC of these objects.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 19:31:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 11:31:19 -0800
Subject: [Patches] [ python-Patches-531491 ] PEP 4 update: deprecations
Message-ID: <E16n2ql-00012B-00@usw-sf-web3.sourceforge.net>

Patches item #531491, was opened at 2002-03-18 14:31
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531491&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Barry Warsaw (bwarsaw)
Assigned to: Martin v. Löwis (loewis)
Summary: PEP 4 update: deprecations

Initial Comment:
The following modules should be deprecated for Python
2.3: mimify.py, rfc822.py, MIMEWriter.py, and
mimetools.py.  All are supplanted by Python 2.2's email
package.

Attached is the proposed mod to PEP 4 as per procedure
described therein.  I am not including mods to the
module documents as those should be easy to add.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531491&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 19:37:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 11:37:04 -0800
Subject: [Patches] [ python-Patches-531493 ] drop PyCore_* API layer
Message-ID: <E16n2wK-00016a-00@usw-sf-web3.sourceforge.net>

Patches item #531493, was opened at 2002-03-18 19:37
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Tim Peters (tim_one)
Summary: drop PyCore_* API layer

Initial Comment:
I think we need to sort out the pymalloc situation
using smaller steps.  I already checked in the first
step.  This patch is the next step and removes the
PyCore_* API layer.  Vladimir said it could go and
I agree.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 19:41:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 11:41:10 -0800
Subject: [Patches] [ python-Patches-473586 ] SimpleXMLRPCServer - fixes and CGI
Message-ID: <E16n30I-00045p-00@usw-sf-web4.sourceforge.net>

Patches item #473586, was opened at 2001-10-22 00:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=473586&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Fredrik Lundh (effbot)
Summary: SimpleXMLRPCServer - fixes and CGI

Initial Comment:
Changes:

o treats xmlrpclib.Fault's correctly (no longer 
absorbes them as generic exceptions)
o changed failed marshal to generate a useful Fault 
instead of an internal server error
o adds a new class to make writing XML-RPC functions 
embedded in other servers, using CGI, easier (tested 
with APACHE)
o to support the above, added a new dispatch helper 
class SimpleXMLRPCDispatcher


----------------------------------------------------------------------

>Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-18 11:41

Message:
Logged In: YES 
user_id=108973

OK, I fixed the backwards compatibility problem.

Also added:
o support for the XML-RPC introspection methods 
system.listMethods and system.methodHelp
o support for the XML-RPC boxcaring method system.multicall

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2001-12-04 11:51

Message:
Logged In: YES 
user_id=108973

Please do not accept this patch past 2.2 release; there are 
so non-backwards compatible changes that need to be though 
through.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2001-10-23 11:02

Message:
Logged In: YES 
user_id=108973

- a few extra comments
- moved a xmlrpclib.loads() inside an exception handler so 
an XML-RPC fault is generated for malformed requests


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2001-10-22 11:59

Message:
Logged In: YES 
user_id=108973

The advantage of the entire patch being accepted before 2.2 
is that there is an API change and, once 2.2 is release, we 
will probably have to make a bit of an attempt to maintain 
backwards compatibility.

If this patch is too high-risk for 2.2 then I can certainly 
design a bug-fix patch for 2.2 and submit a new patch for 
2.3 (that is API compatible with 2.2).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-10-22 11:43

Message:
Logged In: YES 
user_id=21627

Brian, please note that Python 2.2b1 has been released, so 
no new features are acceptable until 2.2. So unless 
Fredrik Lundh wants to accept your entire patch, I think 
it has little chance to get integrated for the next few 
months.
If you want pieces of it accepted, I'd recommend to split 
it into bug fixes and new features; bug fixes are still 
acceptable.


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2001-10-22 11:27

Message:
Logged In: YES 
user_id=108973

I just can't stop mucking with it. This time there are only 
documentation changes. I should also have pointed out that 
this patch changes the mechanism for overriding the 
dispatch mechanism: you used to subclass the request 
handler, now you subclass the server. I believe that this 
change is correct because the server actually has the 
required state information to do the dispatching.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2001-10-22 00:35

Message:
Logged In: YES 
user_id=108973

Changed a name to fit other naming conventions

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=473586&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 20:09:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 12:09:24 -0800
Subject: [Patches] [ python-Patches-531480 ] Use new GC API (generators, iters, ...)
Message-ID: <E16n3Rc-0000aY-00@usw-sf-web1.sourceforge.net>

Patches item #531480, was opened at 2002-03-18 13:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Use new GC API (generators, iters, ...)

Initial Comment:
I just noticed that iterators, generators and method
objects are still using the old GC API.  I thought
I fixed these.  Is it possible that branch merging
backed out the changes?  Maybe my memory is bad.

Anyhow, this patch restores GC of these objects.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 15:09

Message:
Logged In: YES 
user_id=6380

Oops. I suggest you check this in, and mark it as a 2.2
bugfix candidate. We'll see what Michael says. :-)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 20:30:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 12:30:21 -0800
Subject: [Patches] [ python-Patches-531493 ] drop PyCore_* API layer
Message-ID: <E16n3lt-0000qt-00@usw-sf-web1.sourceforge.net>

Patches item #531493, was opened at 2002-03-18 14:37
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
>Assigned to: Neil Schemenauer (nascheme)
Summary: drop PyCore_* API layer

Initial Comment:
I think we need to sort out the pymalloc situation
using smaller steps.  I already checked in the first
step.  This patch is the next step and removes the
PyCore_* API layer.  Vladimir said it could go and
I agree.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-18 15:30

Message:
Logged In: YES 
user_id=31435

I like it too.  +1.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 20:47:40 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 12:47:40 -0800
Subject: [Patches] [ python-Patches-531480 ] Use new GC API (generators, iters, ...)
Message-ID: <E16n42e-000130-00@usw-sf-web1.sourceforge.net>

Patches item #531480, was opened at 2002-03-18 18:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Use new GC API (generators, iters, ...)

Initial Comment:
I just noticed that iterators, generators and method
objects are still using the old GC API.  I thought
I fixed these.  Is it possible that branch merging
backed out the changes?  Maybe my memory is bad.

Anyhow, this patch restores GC of these objects.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 20:47

Message:
Logged In: YES 
user_id=35752

Checked in but not marked as a bugfix candidate.  I don't
think this counts as a bug fix.  I guess Michael can
make up his own mind I guess.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 20:09

Message:
Logged In: YES 
user_id=6380

Oops. I suggest you check this in, and mark it as a 2.2
bugfix candidate. We'll see what Michael says. :-)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 22:28:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 14:28:14 -0800
Subject: [Patches] [ python-Patches-531493 ] drop PyCore_* API layer
Message-ID: <E16n5by-00068I-00@usw-sf-web4.sourceforge.net>

Patches item #531493, was opened at 2002-03-18 19:37
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
>Assigned to: Tim Peters (tim_one)
Summary: drop PyCore_* API layer

Initial Comment:
I think we need to sort out the pymalloc situation
using smaller steps.  I already checked in the first
step.  This patch is the next step and removes the
PyCore_* API layer.  Vladimir said it could go and
I agree.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 22:28

Message:
Logged In: YES 
user_id=35752

Checked in.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-18 20:30

Message:
Logged In: YES 
user_id=31435

I like it too.  +1.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 23:00:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 15:00:32 -0800
Subject: [Patches] [ python-Patches-531629 ] Add multicall support to xmlrpclib
Message-ID: <E16n67E-0002go-00@usw-sf-web1.sourceforge.net>

Patches item #531629, was opened at 2002-03-18 15:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531629&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add multicall support to xmlrpclib

Initial Comment:
Adds a new object to xmlrpclib that allows the user to 
boxcared XML-RPC requests e.g.

server_proxy = ServerProxy(...)
multicall = MultiCall(server_proxy)
multicall.add(2,3)
multicall.get_address("Guido")

add_result, address = multicall()

see http://www.xmlrpc.com/discuss/msgReader$1208


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531629&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 23:08:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 15:08:15 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16n6Eh-0002mp-00@usw-sf-web1.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 00:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Neil Schemenauer (nascheme)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 23:08

Message:
Logged In: YES 
user_id=35752

Update patch to latest CVS.  It's now about 1/3 of its
original size.  We still need documentation for
PyMalloc_{New,NewVar,Del}.

Other than the docs, the only thing left to do is decide if we
want the new API.  The situation with extension modules is
not as bad as I originally thought.  The xxmodule.c example
has been correct since version 1.6.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 19:32

Message:
Logged In: YES 
user_id=31435

I certainly want, e.g., that our Unicode implementation can 
choose to use obmalloc.c for its raw string storage, 
despite that it isn't "object storage" (in the sense of 
Vladimir's level "+2" in the diagram at the top of 
obmalloc.c; the current CVS code restricts obmalloc use to 
level +2, while raw string storage is at level "+1").

Allowing to use pymalloc at level +1 changes Vladimir's 
original intent, and we have no experience with it, so I'm 
fine with restricting that ability to the core at the start.

About names, we've been calling this package "pymalloc" for 
years, and the general form of external name throughout 
Python is

    ["_"] "Py" Package "_" Function

_PyMalloc_{Malloc, Free, etc} fit that pattern perfectly.  
I don't see the attraction to giving functions from this 
package idiosyncratic names, and we've got so many ways to 
spell "get memory" that I expect it will be a genuine help 
to keep on making it clear, from the name alone, to 
which "family" a given variant of "new" (etc) belongs.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-17 17:11

Message:
Logged In: YES 
user_id=35752

I'm not sure exactly what Tim meant by that comment.  If we
want to make PyMalloc available to EXTENSION modules then,
yes, we need to remove the leading underscope and make a
wrapper for it.  I would prefer to keep it private for
now since it gives us more freedom on how PyMalloc_New
is implemented.  Tim?

Regarding the names, I have no problem with Py_Malloc.  If
we change should we keep PyMalloc_{New,NewVar,Del}?  Py_New
seems at little to short.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 10:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-16 03:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 00:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Mon Mar 18 23:23:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 18 Mar 2002 15:23:37 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16n6TZ-0006GP-00@usw-sf-web2.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 00:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Neil Schemenauer (nascheme)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 23:23

Message:
Logged In: YES 
user_id=35752

Oops, forgot one important change in the last update.
PyObject_MALLOC needs to use PyMem_MALLOC not 
_PyMalloc_MALLOC. Clear as mud, no? :-)

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 23:08

Message:
Logged In: YES 
user_id=35752

Update patch to latest CVS.  It's now about 1/3 of its
original size.  We still need documentation for
PyMalloc_{New,NewVar,Del}.

Other than the docs, the only thing left to do is decide if we
want the new API.  The situation with extension modules is
not as bad as I originally thought.  The xxmodule.c example
has been correct since version 1.6.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 19:32

Message:
Logged In: YES 
user_id=31435

I certainly want, e.g., that our Unicode implementation can 
choose to use obmalloc.c for its raw string storage, 
despite that it isn't "object storage" (in the sense of 
Vladimir's level "+2" in the diagram at the top of 
obmalloc.c; the current CVS code restricts obmalloc use to 
level +2, while raw string storage is at level "+1").

Allowing to use pymalloc at level +1 changes Vladimir's 
original intent, and we have no experience with it, so I'm 
fine with restricting that ability to the core at the start.

About names, we've been calling this package "pymalloc" for 
years, and the general form of external name throughout 
Python is

    ["_"] "Py" Package "_" Function

_PyMalloc_{Malloc, Free, etc} fit that pattern perfectly.  
I don't see the attraction to giving functions from this 
package idiosyncratic names, and we've got so many ways to 
spell "get memory" that I expect it will be a genuine help 
to keep on making it clear, from the name alone, to 
which "family" a given variant of "new" (etc) belongs.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-17 17:11

Message:
Logged In: YES 
user_id=35752

I'm not sure exactly what Tim meant by that comment.  If we
want to make PyMalloc available to EXTENSION modules then,
yes, we need to remove the leading underscope and make a
wrapper for it.  I would prefer to keep it private for
now since it gives us more freedom on how PyMalloc_New
is implemented.  Tim?

Regarding the names, I have no problem with Py_Malloc.  If
we change should we keep PyMalloc_{New,NewVar,Del}?  Py_New
seems at little to short.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 10:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-16 03:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 00:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 09:24:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 01:24:30 -0800
Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response
Message-ID: <E16nFr4-0000jS-00@usw-sf-web1.sourceforge.net>

Patches item #517256, was opened at 2002-02-14 00:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: James Rucker (jamesrucker)
Assigned to: Fredrik Lundh (effbot)
Summary: poor performance in xmlrpc response

Initial Comment:
xmlrpclib.Transport.parse_response() (called from 
xmlrpclib.Transport.request()) is exhibiting poor 
performance - approx. 10x slower than expected.

I investigated based on using a simple app that sent a 
msg to a server, where all the server did was return 
the message back to the caller.  From profiling, it 
became clear that the return trip was taken 10x the 
time consumed by the client->server trip, and that the 
time was spent getting things across the wire.

parse_response() reads from a file object created via 
socket.makefile(), and as a result exhibits 
performance that is about an order of magnitude worse 
than what it would be if socket.recv() were used on 
the socket.  The patch provided uses socket.recv() 
when possible, to improve performance.

The patch provided is against revision 1.15.  Its use 
provides performance for the return trip that is more 
or less equivalent to that of the forward trip.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2002-03-19 10:24

Message:
Logged In: YES 
user_id=38376

What server did you use?  In all my test setups, 
h._conn.sock is None at the time parse_response
is called...

----------------------------------------------------------------------

Comment By: James Rucker (jamesrucker)
Date: 2002-03-17 17:13

Message:
Logged In: YES 
user_id=351540

The problem was discovered under FreeBSD 4.4.


----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-17 14:30

Message:
Logged In: YES 
user_id=38376

James, what platform(s) did you use?

I'm not sure changing the parse_response() interface is
a good idea, but if this is a Windows-only problem, there
may be a slightly cleaner way to get the same end result.

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:14

Message:
Logged In: YES 
user_id=6380

My guess makefile() isn't buffering properly. This has been
a long-standing problem on Windows; I'm not sure if it's an
issue on Unix.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-01 15:34

Message:
Logged In: YES 
user_id=38376

looks fine to me.  I'll merge it with SLAB changes,
and will check it into the 2.3 codebase asap.

(we probably should try to figure out why makefile
causes a 10x slowdown too -- xmlrpclib isn't exactly
the only client library reading from a buffered
socket)

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 00:23

Message:
Logged In: YES 
user_id=6380

Fredrik, does this look OK to you?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 10:57:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 02:57:37 -0800
Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals()
Message-ID: <E16nHJB-0001Nv-00@usw-sf-web1.sourceforge.net>

Patches item #511219, was opened at 2002-01-31 15:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Douady (douady)
Assigned to: Nobody/Anonymous (nobody)
Summary: suppress type restrictions on locals()

Initial Comment:
This patch suppresses the restriction that global and
local dictionaries do not access overloaded __getitem__
and __setitem__ if passed an object derived from class
dict.

An exception is made for the builtin insertion and
reference in the global dict to make sure this object
exists and to suppress the need for the derived class
to take care of this implementation dependent detail.

The behavior of eval and exec has been updated for code
objects which have the CO_NEWLOCALS flag set : if
explicitely passed a local dict, a new local dict is
not generated. This allows one to pass an explicit
local dict to the code object of a function (which
otherwise cannot be achieved). If this cannot be done
for backward compatibility problems, then an
alternative would consist in using the "new" module to
create a code object from a function with CO_NEWLOCALS
reset but it seems logical to me to use the information
explicitely provided.

Free and cell variables are not managed in this
version. If the patch is accepted, I am willing to
finish the job and implement free and cell variables,
but this requires a serious rework of the Cell object:
free variables should be accessed using the method of
the dict in which they relies and today, this dict is
not accessible from the Cell object.

Robustness : Currently, the plain test suite passes
(with a modification of test_desctut which precisely
verifies that the suppressed restriction is enforced).
 I have introduced a new test (test_subdict.py) which
verifies the new behavior.

Because of performance, the plain case (when the local
dict is a plain dict) is optimized so that differences
in performance are not measurable (within 1%) when run
on the test suite (i.e. I timed make test).


----------------------------------------------------------------------

>Comment By: Cesar Douady (douady)
Date: 2002-03-19 11:57

Message:
Logged In: YES 
user_id=428521

Granted. Seems fair.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:59

Message:
Logged In: YES 
user_id=21627

This is quite a complex change. If you want to see it
integrated, I recommend that you find people that try it out
and report their experience here.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 14:35:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 06:35:21 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16nKht-0001nR-00@usw-sf-web1.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 15:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 15:13:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 07:13:46 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16nLJ4-0003QW-00@usw-sf-web1.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 16:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 15:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 14:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 15:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 16:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 14:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 11:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 10:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 10:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 18:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 17:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 15:53:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 07:53:46 -0800
Subject: [Patches] [ python-Patches-531901 ] binary packagers
Message-ID: <E16nLvm-0004dH-00@usw-sf-web3.sourceforge.net>

Patches item #531901, was opened at 2002-03-19 15:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
Assigned to: Nobody/Anonymous (nobody)
Summary: binary packagers

Initial Comment:
zip file with updated Solaris and HP-UX packagers.
Replaces 415226, 415227, 415228.

Changes made to take advantage of new PEP241 changes in
the Distribution class.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 16:13:53 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 08:13:53 -0800
Subject: [Patches] [ python-Patches-531901 ] binary packagers
Message-ID: <E16nMFF-0004qQ-00@usw-sf-web3.sourceforge.net>

Patches item #531901, was opened at 2002-03-19 15:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
>Assigned to: M.-A. Lemburg (lemburg)
Summary: binary packagers

Initial Comment:
zip file with updated Solaris and HP-UX packagers.
Replaces 415226, 415227, 415228.

Changes made to take advantage of new PEP241 changes in
the Distribution class.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 17:05:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 09:05:05 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16nN2n-0008DL-00@usw-sf-web4.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 18:05

Message:
Logged In: YES 
user_id=21627

The CVS version will usually use a completely different
library name (e.g. libpython23.so), so there will be no
conflicts with prior versions.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 16:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 15:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 17:14:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 09:14:01 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16nNBR-0008I2-00@usw-sf-web4.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 16:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 17:14

Message:
Logged In: YES 
user_id=10327

This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes.

That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-)

This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python.  :-(

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 17:05

Message:
Logged In: YES 
user_id=21627

The CVS version will usually use a completely different
library name (e.g. libpython23.so), so there will be no
conflicts with prior versions.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 15:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 14:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 15:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 16:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 14:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 16:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 14:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 11:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 10:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 10:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 18:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 17:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 19:47:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 11:47:14 -0800
Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response
Message-ID: <E16nPZi-0006L2-00@usw-sf-web1.sourceforge.net>

Patches item #517256, was opened at 2002-02-13 15:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470

Category: Library (Lib)
Group: Python 2.1.2
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: James Rucker (jamesrucker)
Assigned to: Fredrik Lundh (effbot)
Summary: poor performance in xmlrpc response

Initial Comment:
xmlrpclib.Transport.parse_response() (called from 
xmlrpclib.Transport.request()) is exhibiting poor 
performance - approx. 10x slower than expected.

I investigated based on using a simple app that sent a 
msg to a server, where all the server did was return 
the message back to the caller.  From profiling, it 
became clear that the return trip was taken 10x the 
time consumed by the client->server trip, and that the 
time was spent getting things across the wire.

parse_response() reads from a file object created via 
socket.makefile(), and as a result exhibits 
performance that is about an order of magnitude worse 
than what it would be if socket.recv() were used on 
the socket.  The patch provided uses socket.recv() 
when possible, to improve performance.

The patch provided is against revision 1.15.  Its use 
provides performance for the return trip that is more 
or less equivalent to that of the forward trip.


----------------------------------------------------------------------

>Comment By: James Rucker (jamesrucker)
Date: 2002-03-19 11:47

Message:
Logged In: YES 
user_id=351540

HTTPConnection.getresponse() will close the socket and set
self.sock to null after instantiating response_class (by
default, this is HTTPResponse; note that HTTPResponse does a
makefile() and stores the result in self.fp) iff the newly
created response class instance's 'will_close' attribute is
true.   

My server is setting the Keep-alive header with a value of 1
(it is based on xmlrpcserver.py), which causes will_close to
evaluate to false.  In your case, I'm presuming that
will_close is being evaluated as false and thus the socket
(accessed via h._conn.sock) has been set to <None>.  Note
that when I removed the Keep-alive header, I witness the
behaviour you're seeing.

Thus, it seems that as it stands, the beneift of the change
will only be realized if Keep-alive is set or HTTP/1.1 is
used (and Keep-alive is either not specified or is set to
non-zero).

The following from httplib.py shows and explains how
'will_close' will be set (from httplib.py):

        conn = self.msg.getheader('connection')
        if conn:
            conn = conn.lower()
            # a "Connection: close" will always close the
connection. if we
            # don't see that and this is not HTTP/1.1, then
the connection will
            # close unless we see a Keep-Alive header.
            self.will_close = conn.find('close') != -1 or \
                              ( self.version != 11 and \
                                not
self.msg.getheader('keep-alive') )
        else:
            # for HTTP/1.1, the connection will always
remain open
            # otherwise, it will remain open IFF we see a
Keep-Alive header
            self.will_close = self.version != 11 and \
                              not
self.msg.getheader('keep-alive')

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-19 01:24

Message:
Logged In: YES 
user_id=38376

What server did you use?  In all my test setups, 
h._conn.sock is None at the time parse_response
is called...

----------------------------------------------------------------------

Comment By: James Rucker (jamesrucker)
Date: 2002-03-17 08:13

Message:
Logged In: YES 
user_id=351540

The problem was discovered under FreeBSD 4.4.


----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-17 05:30

Message:
Logged In: YES 
user_id=38376

James, what platform(s) did you use?

I'm not sure changing the parse_response() interface is
a good idea, but if this is a Windows-only problem, there
may be a slightly cleaner way to get the same end result.

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 08:14

Message:
Logged In: YES 
user_id=6380

My guess makefile() isn't buffering properly. This has been
a long-standing problem on Windows; I'm not sure if it's an
issue on Unix.

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2002-03-01 06:34

Message:
Logged In: YES 
user_id=38376

looks fine to me.  I'll merge it with SLAB changes,
and will check it into the 2.3 codebase asap.

(we probably should try to figure out why makefile
causes a 10x slowdown too -- xmlrpclib isn't exactly
the only client library reading from a buffered
socket)

</F>

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-02-28 15:23

Message:
Logged In: YES 
user_id=6380

Fredrik, does this look OK to you?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470


From noreply@sourceforge.net  Tue Mar 19 22:28:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 14:28:46 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nS62-0000hy-00@usw-sf-web3.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 14:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 07:28:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 23:28:57 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16naWn-000114-00@usw-sf-web4.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 23:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 07:35:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 23:35:37 -0800
Subject: [Patches] [ python-Patches-531901 ] binary packagers
Message-ID: <E16nadF-0000X3-00@usw-sf-web2.sourceforge.net>

Patches item #531901, was opened at 2002-03-19 16:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
Assigned to: M.-A. Lemburg (lemburg)
Summary: binary packagers

Initial Comment:
zip file with updated Solaris and HP-UX packagers.
Replaces 415226, 415227, 415228.

Changes made to take advantage of new PEP241 changes in
the Distribution class.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:35

Message:
Logged In: YES 
user_id=21627

Which of the three attached files is the right one (19633,
19634, or 19635)? Unless they are all needed, we should
delete the extra copies.

I recommend to apply PEP 2 to this patch: A library PEP is
needed (which could be quite short), documentation, perhaps
test cases. Most importantly, there must be an identified
maintainer of these modules. Are you willing to act as the
maintainer?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 07:41:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 19 Mar 2002 23:41:33 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16naiz-00019v-00@usw-sf-web4.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:41

Message:
Logged In: YES 
user_id=21627

The API version is maintained in modsupport.h:API_VERSION.

I'm personally not concerned about breakage of API during
the development of a new release. Absolutely no breakage
should occur in maintenance releases. After all, a
maintenance will replace pythonxy.dll on Windows with no
protection against API breakage, thus, it is a bug if the
API changes in a maintenace release.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 18:14

Message:
Logged In: YES 
user_id=10327

This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes.

That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-)

This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python.  :-(

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 18:05

Message:
Logged In: YES 
user_id=21627

The CVS version will usually use a completely different
library name (e.g. libpython23.so), so there will be no
conflicts with prior versions.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 16:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 15:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 14:49:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 06:49:08 -0800
Subject: [Patches] [ python-Patches-531901 ] binary packagers
Message-ID: <E16nhOm-0002Fl-00@usw-sf-web1.sourceforge.net>

Patches item #531901, was opened at 2002-03-19 15:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
Assigned to: M.-A. Lemburg (lemburg)
Summary: binary packagers

Initial Comment:
zip file with updated Solaris and HP-UX packagers.
Replaces 415226, 415227, 415228.

Changes made to take advantage of new PEP241 changes in
the Distribution class.

----------------------------------------------------------------------

>Comment By: Mark Alexander (mwa)
Date: 2002-03-20 14:49

Message:
Logged In: YES 
user_id=12810

Any of the three (they're all the same). SourceForge
hiccuped during the upload, and I don't have permission to
delete the duplicates.

I don't exactly understand what you mean by applying PEP 2.
I uploaded this per Marc Lemburg's request for the latest
versions of patches 41522[6-8]. He's acting as as the
integrator in this case (see
http://mail.python.org/pipermail/distutils-sig/2001-December/002659.html).
I let him know about the duplicate uploads, so hopefully
he'll correct it. If you can and want, feel free to delete
the 2 of your choice.

I agree they need to be documented. As soon as I can, I'll
submit changes to the Distutils documentation.

Finally, yes, I'll act as maintainer. I'm on the
Distutils-sig and as soon as some other poor soul who has to
deal with Solaris or HP-UX tries them, I'm there to work out
issues.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 07:35

Message:
Logged In: YES 
user_id=21627

Which of the three attached files is the right one (19633,
19634, or 19635)? Unless they are all needed, we should
delete the extra copies.

I recommend to apply PEP 2 to this patch: A library PEP is
needed (which could be quite short), documentation, perhaps
test cases. Most importantly, there must be an identified
maintainer of these modules. Are you willing to act as the
maintainer?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 15:02:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 07:02:12 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nhbQ-0003Kx-00@usw-sf-web3.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 15:35:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 07:35:35 -0800
Subject: [Patches] [ python-Patches-531901 ] binary packagers
Message-ID: <E16ni7j-0003l1-00@usw-sf-web3.sourceforge.net>

Patches item #531901, was opened at 2002-03-19 16:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
Assigned to: M.-A. Lemburg (lemburg)
Summary: binary packagers

Initial Comment:
zip file with updated Solaris and HP-UX packagers.
Replaces 415226, 415227, 415228.

Changes made to take advantage of new PEP241 changes in
the Distribution class.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 16:35

Message:
Logged In: YES 
user_id=21627

You volunteering as the maintainer is part of the
prerequisites of accepting new modules, when following PEP
2, see

http://python.sourceforge.net/peps/pep-0002.html

It says: "developers ... will first form a group of
maintainers. Then, this group shall produce a PEP called a
library PEP."

So existance of a PEP describing these library extensions
would be a prerequisite for accepting them. If MAL wants to
waive this requirement, it would be fine with me. However,
such a PEP could also share text with the documentation, so
it might not be wasted effort.


----------------------------------------------------------------------

Comment By: Mark Alexander (mwa)
Date: 2002-03-20 15:49

Message:
Logged In: YES 
user_id=12810

Any of the three (they're all the same). SourceForge
hiccuped during the upload, and I don't have permission to
delete the duplicates.

I don't exactly understand what you mean by applying PEP 2.
I uploaded this per Marc Lemburg's request for the latest
versions of patches 41522[6-8]. He's acting as as the
integrator in this case (see
http://mail.python.org/pipermail/distutils-sig/2001-December/002659.html).
I let him know about the duplicate uploads, so hopefully
he'll correct it. If you can and want, feel free to delete
the 2 of your choice.

I agree they need to be documented. As soon as I can, I'll
submit changes to the Distutils documentation.

Finally, yes, I'll act as maintainer. I'm on the
Distutils-sig and as soon as some other poor soul who has to
deal with Solaris or HP-UX tries them, I'm there to work out
issues.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:35

Message:
Logged In: YES 
user_id=21627

Which of the three attached files is the right one (19633,
19634, or 19635)? Unless they are all needed, we should
delete the extra copies.

I recommend to apply PEP 2 to this patch: A library PEP is
needed (which could be quite short), documentation, perhaps
test cases. Most importantly, there must be an identified
maintainer of these modules. Are you willing to act as the
maintainer?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 16:03:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 08:03:02 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16niYI-00047K-00@usw-sf-web3.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 23:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 17:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 16:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 16:23:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 08:23:42 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nisI-0004OY-00@usw-sf-web3.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 17:26:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 09:26:04 -0800
Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance
Message-ID: <E16njqe-0007OY-00@usw-sf-web2.sourceforge.net>

Patches item #504943, was opened at 2002-01-17 17:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: call warnings.warn with Warning instance

Initial Comment:
This patch makes it possible to pass Warning instances 
as the first argument to warnings.warn. In this case 
the category argument will be ignored. The message
text used will be str(warninginstance). This makes it 
possible to implement special logic in a custom 
Warning class by implemening the __str__ method.


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-20 18:26

Message:
Logged In: YES 
user_id=89016

Now that I have write access can I check this in?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-18 16:46

Message:
Logged In: YES 
user_id=89016

The new version includes a patch to the documentation and 
an entry in Misc/NEWS

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 14:45

Message:
Logged In: YES 
user_id=6380

Nice idea. Where's the documentation patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 17:31:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 09:31:34 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16njvy-0005HT-00@usw-sf-web3.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 14:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 09:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 08:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 07:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 23:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 17:53:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 09:53:43 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nkHP-0007km-00@usw-sf-web2.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 18:08:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 10:08:03 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nkVH-0004pM-00@usw-sf-web1.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 18:21:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 10:21:39 -0800
Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance
Message-ID: <E16nkiR-000502-00@usw-sf-web1.sourceforge.net>

Patches item #504943, was opened at 2002-01-17 11:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470

Category: Library (Lib)
>Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
>Assigned to: Walter Dörwald (doerwalter)
Summary: call warnings.warn with Warning instance

Initial Comment:
This patch makes it possible to pass Warning instances 
as the first argument to warnings.warn. In this case 
the category argument will be ignored. The message
text used will be str(warninginstance). This makes it 
possible to implement special logic in a custom 
Warning class by implemening the __str__ method.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-20 13:21

Message:
Logged In: YES 
user_id=6380

Looks OK. Give it a try.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-20 12:26

Message:
Logged In: YES 
user_id=89016

Now that I have write access can I check this in?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-18 10:46

Message:
Logged In: YES 
user_id=89016

The new version includes a patch to the documentation and 
an entry in Misc/NEWS

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 08:45

Message:
Logged In: YES 
user_id=6380

Nice idea. Where's the documentation patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 18:42:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 10:42:49 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16nl2v-0005Gu-00@usw-sf-web1.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 12:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 18:56:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 10:56:04 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16nlFk-0006Hz-00@usw-sf-web3.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 13:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 18:57:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 10:57:09 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nlGn-0005SG-00@usw-sf-web1.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 14:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 10:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 09:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 09:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 08:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 07:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 23:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 19:04:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 11:04:15 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nlNf-0000A4-00@usw-sf-web2.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 14:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 13:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 19:32:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 11:32:28 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nloy-00017K-00@usw-sf-web4.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 14:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 11:32

Message:
Logged In: YES 
user_id=108973

I think that we should be flexible about the data that we 
accept but rigorous about the data that we generate. So the 
sign should always be send but not required. 

"decimal format" appears in the Python documentation 
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the 
meaning is not widely known.

I parsed it as "not exponential format".

My question was whether the %f Python format specifier 
simply mapped to the C %f format specifier. But, based on 
the output of a simple C program, that does not appear to 
be the case.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 10:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 09:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 09:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 08:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 07:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 23:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 19:55:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 11:55:03 -0800
Subject: [Patches] [ python-Patches-531901 ] binary packagers
Message-ID: <E16nmAp-0000mX-00@usw-sf-web2.sourceforge.net>

Patches item #531901, was opened at 2002-03-19 15:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Alexander (mwa)
Assigned to: M.-A. Lemburg (lemburg)
Summary: binary packagers

Initial Comment:
zip file with updated Solaris and HP-UX packagers.
Replaces 415226, 415227, 415228.

Changes made to take advantage of new PEP241 changes in
the Distribution class.

----------------------------------------------------------------------

>Comment By: Mark Alexander (mwa)
Date: 2002-03-20 19:55

Message:
Logged In: YES 
user_id=12810

OK, the PEP seems to me to mean most of this is done.

These additions are not library modules, they are Distutils
"commands". So the way i read it, the Distutils-SIG (where
I've been hanging around for some time) are the Maintainers.

The documentation will be 2 new chapters for the Distutils
manual "Creating Solaris packages" and "Creating HP-UX
packages" each looking a whole lot like "Creating RPM packages".

Does that clarify anything, or am I still missing a clue?

p.s. Thanks for cleaning up the extra uploads!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 15:35

Message:
Logged In: YES 
user_id=21627

You volunteering as the maintainer is part of the
prerequisites of accepting new modules, when following PEP
2, see

http://python.sourceforge.net/peps/pep-0002.html

It says: "developers ... will first form a group of
maintainers. Then, this group shall produce a PEP called a
library PEP."

So existance of a PEP describing these library extensions
would be a prerequisite for accepting them. If MAL wants to
waive this requirement, it would be fine with me. However,
such a PEP could also share text with the documentation, so
it might not be wasted effort.


----------------------------------------------------------------------

Comment By: Mark Alexander (mwa)
Date: 2002-03-20 14:49

Message:
Logged In: YES 
user_id=12810

Any of the three (they're all the same). SourceForge
hiccuped during the upload, and I don't have permission to
delete the duplicates.

I don't exactly understand what you mean by applying PEP 2.
I uploaded this per Marc Lemburg's request for the latest
versions of patches 41522[6-8]. He's acting as as the
integrator in this case (see
http://mail.python.org/pipermail/distutils-sig/2001-December/002659.html).
I let him know about the duplicate uploads, so hopefully
he'll correct it. If you can and want, feel free to delete
the 2 of your choice.

I agree they need to be documented. As soon as I can, I'll
submit changes to the Distutils documentation.

Finally, yes, I'll act as maintainer. I'm on the
Distutils-sig and as soon as some other poor soul who has to
deal with Solaris or HP-UX tries them, I'm there to work out
issues.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 07:35

Message:
Logged In: YES 
user_id=21627

Which of the three attached files is the right one (19633,
19634, or 19635)? Unless they are all needed, we should
delete the extra copies.

I recommend to apply PEP 2 to this patch: A library PEP is
needed (which could be quite short), documentation, perhaps
test cases. Most importantly, there must be an identified
maintainer of these modules. Are you willing to act as the
maintainer?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 20:13:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 12:13:48 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nmSy-0001cM-00@usw-sf-web4.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 15:13

Message:
Logged In: YES 
user_id=31435

If you think XML-RPC users are keen to see multi-hundred 
character strings produced for ordinary doubles, Python 
isn't going to be much help (you'll have to write your own 
float -> string conversion); or if you think they're happy 
to get an exception if they want to pass (e.g.) 1e20, you 
can keep using repr() and complain because repr(1e20) 
produces an exponent.

"decimal format" is simply two extremely common words 
pasted together <+.9 wink>.  I expect the Python docs here 
ended up so vague because whoever wrote this part of the 
docs didn't know the full story and didn't have time to 
figure it out.

But I expect the same is true of the part of this spec 
dealing with doubles (it doesn't define what it means 
by "double-precision", and then goes on to say stuff that 
doesn't make sense for what C or Java mean by double, or by 
what IEEE-754 means by double precision -- it's off in its 
own world, so if you take it at face value you'll have to 
guess what the world is, and implement it yourself).

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 14:32

Message:
Logged In: YES 
user_id=108973

I think that we should be flexible about the data that we 
accept but rigorous about the data that we generate. So the 
sign should always be send but not required. 

"decimal format" appears in the Python documentation 
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the 
meaning is not widely known.

I parsed it as "not exponential format".

My question was whether the %f Python format specifier 
simply mapped to the C %f format specifier. But, based on 
the output of a simple C program, that does not appear to 
be the case.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 14:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 13:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 20:48:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 12:48:23 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16nn0R-0001Pk-00@usw-sf-web2.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 14:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:48

Message:
Logged In: YES 
user_id=108973

Ooops, I already wrote the converter (see new patch). I'm 
not very concerned about sending 300 character strings for 
large doubles, but I guess someone might be. I am concerned 
about how large and ugly the code is.

XML-RPC is very poorly specified but the grammar for 
doubles seems reasonably clear (silly, but clear).

If you don't like my double marshalling code, you could 
please just checkin your infinity/NaN detection code (also 
part of my patch)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:13

Message:
Logged In: YES 
user_id=31435

If you think XML-RPC users are keen to see multi-hundred 
character strings produced for ordinary doubles, Python 
isn't going to be much help (you'll have to write your own 
float -> string conversion); or if you think they're happy 
to get an exception if they want to pass (e.g.) 1e20, you 
can keep using repr() and complain because repr(1e20) 
produces an exponent.

"decimal format" is simply two extremely common words 
pasted together <+.9 wink>.  I expect the Python docs here 
ended up so vague because whoever wrote this part of the 
docs didn't know the full story and didn't have time to 
figure it out.

But I expect the same is true of the part of this spec 
dealing with doubles (it doesn't define what it means 
by "double-precision", and then goes on to say stuff that 
doesn't make sense for what C or Java mean by double, or by 
what IEEE-754 means by double precision -- it's off in its 
own world, so if you take it at face value you'll have to 
guess what the world is, and implement it yourself).

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 11:32

Message:
Logged In: YES 
user_id=108973

I think that we should be flexible about the data that we 
accept but rigorous about the data that we generate. So the 
sign should always be send but not required. 

"decimal format" appears in the Python documentation 
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the 
meaning is not widely known.

I parsed it as "not exponential format".

My question was whether the %f Python format specifier 
simply mapped to the C %f format specifier. But, based on 
the output of a simple C program, that does not appear to 
be the case.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 10:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 09:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 09:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 08:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 07:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 23:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 21:07:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 13:07:29 -0800
Subject: [Patches] [ python-Patches-532729 ] build (link) fails on Solaris 8-sem_init
Message-ID: <E16nnIv-0007uG-00@usw-sf-web3.sourceforge.net>

Patches item #532729, was opened at 2002-03-20 16:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: build (link) fails on Solaris 8-sem_init

Initial Comment:
The build fails on Solaris 8 because sem_init() is in -lrt.

Attached is a patch which works.  Actually, there will
be 3 patches.  1 to configure.in, 1 to configure which
has many changes (my autoconf must be different than
whoever generates configure normally) and a minimal
configure diff.

Probably would be best to have the correct person
generate a new configure.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 21:50:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 13:50:58 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16nnz0-0002CI-00@usw-sf-web2.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 12:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 15:50

Message:
Logged In: YES 
user_id=44345

hmmm...  How much would I have to modify it to get you
to change your mind?  I'm pretty sure I can get rid of
the call to PyObject_HasAttrString without a lot of
effort.  I can't do much about avoiding at least one
PyObject_GetAttrString call though, which obviously
means you could wind up back in bytecode.

I jumped on this after seeing the request in c.l.py
mostly because I've wanted it from time-to-time as
well.  The extra information is useful at times.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 22:53:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 14:53:55 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16noxv-0008NP-00@usw-sf-web1.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
>Assigned to: Fredrik Lundh (effbot)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 17:53

Message:
Logged In: YES 
user_id=31435

I don't use XML-RPC, so I'm assigning this to /F (it was 
his code at the start, and he wants to keep it in synch 
with his company's version).

Formatting floats is a difficult job if you pay attention 
to accuracy.  The original code had the property that 
converting a Python float to an XML-RPC string, then back 
to a float again, reproduced the original input exactly.  
The code in the patch enjoys that property only by 
accident; much of the time a roundtrip conversion using it 
won't reproduce the number that was passed in.  Is that 
OK?  There's no way to tell, since the XML-RPC spec has 
scant idea what it's doing here, so leaves important 
questions unanswered.  OTOH, it seems to me that the 
*point* of this porotocol is to transport values across 
boxes, so of course it should move heaven and earth to 
transport them faithfully.

Is it OK that it loses accuracy?  Is it OK that it produces 
16 trailing zeroes for 1e-250?  Is it OK that it raises 
OverflowError for the normal double 1e-300?  No matter 
what's asked, the spec has no answers.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 15:48

Message:
Logged In: YES 
user_id=108973

Ooops, I already wrote the converter (see new patch). I'm 
not very concerned about sending 300 character strings for 
large doubles, but I guess someone might be. I am concerned 
about how large and ugly the code is.

XML-RPC is very poorly specified but the grammar for 
doubles seems reasonably clear (silly, but clear).

If you don't like my double marshalling code, you could 
please just checkin your infinity/NaN detection code (also 
part of my patch)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 15:13

Message:
Logged In: YES 
user_id=31435

If you think XML-RPC users are keen to see multi-hundred 
character strings produced for ordinary doubles, Python 
isn't going to be much help (you'll have to write your own 
float -> string conversion); or if you think they're happy 
to get an exception if they want to pass (e.g.) 1e20, you 
can keep using repr() and complain because repr(1e20) 
produces an exponent.

"decimal format" is simply two extremely common words 
pasted together <+.9 wink>.  I expect the Python docs here 
ended up so vague because whoever wrote this part of the 
docs didn't know the full story and didn't have time to 
figure it out.

But I expect the same is true of the part of this spec 
dealing with doubles (it doesn't define what it means 
by "double-precision", and then goes on to say stuff that 
doesn't make sense for what C or Java mean by double, or by 
what IEEE-754 means by double precision -- it's off in its 
own world, so if you take it at face value you'll have to 
guess what the world is, and implement it yourself).

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 14:32

Message:
Logged In: YES 
user_id=108973

I think that we should be flexible about the data that we 
accept but rigorous about the data that we generate. So the 
sign should always be send but not required. 

"decimal format" appears in the Python documentation 
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the 
meaning is not widely known.

I parsed it as "not exponential format".

My question was whether the %f Python format specifier 
simply mapped to the C %f format specifier. But, based on 
the output of a simple C program, that does not appear to 
be the case.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 14:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 13:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 23:09:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 15:09:36 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16npD6-0000vV-00@usw-sf-web3.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 13:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 18:09

Message:
Logged In: YES 
user_id=31435

If it's one cycle slower than it is today when the 
exception is ignored, Zope will notice it (it uses hasattr 
for blood).  Then Guido will get fired, have to pump gas in 
Amsterdam for a living, and we'll never hear from him 
again.  How badly do you want to destroy Python <wink>?

It may be fruitful to hammer out an efficient alternative 
on PythonDev.

It's not an argument about whether more info would be 
useful, although <wink> on c.l.py Dale seemed happy enough 
as soon as someone explained what 'module' was doing in his 
msg.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 16:50

Message:
Logged In: YES 
user_id=44345

hmmm...  How much would I have to modify it to get you
to change your mind?  I'm pretty sure I can get rid of
the call to PyObject_HasAttrString without a lot of
effort.  I can't do much about avoiding at least one
PyObject_GetAttrString call though, which obviously
means you could wind up back in bytecode.

I jumped on this after seeing the request in c.l.py
mostly because I've wanted it from time-to-time as
well.  The extra information is useful at times.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 23:24:47 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 15:24:47 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16npRn-0003rQ-00@usw-sf-web4.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 14:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Fredrik Lundh (effbot)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 15:24

Message:
Logged In: YES 
user_id=108973

OK, this floating point stuff is over my head.

Is it OK that it loses accuracy?  
- No
Is it OK that it produces 16 trailing zeroes for 1e-250?
- Yes
Is it OK that it raises OverflowError for the normal double 
1e-300?  
- No

Would exposing and using the C %f specifier, along with 
repr, make for identical roundtrips?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 14:53

Message:
Logged In: YES 
user_id=31435

I don't use XML-RPC, so I'm assigning this to /F (it was 
his code at the start, and he wants to keep it in synch 
with his company's version).

Formatting floats is a difficult job if you pay attention 
to accuracy.  The original code had the property that 
converting a Python float to an XML-RPC string, then back 
to a float again, reproduced the original input exactly.  
The code in the patch enjoys that property only by 
accident; much of the time a roundtrip conversion using it 
won't reproduce the number that was passed in.  Is that 
OK?  There's no way to tell, since the XML-RPC spec has 
scant idea what it's doing here, so leaves important 
questions unanswered.  OTOH, it seems to me that the 
*point* of this porotocol is to transport values across 
boxes, so of course it should move heaven and earth to 
transport them faithfully.

Is it OK that it loses accuracy?  Is it OK that it produces 
16 trailing zeroes for 1e-250?  Is it OK that it raises 
OverflowError for the normal double 1e-300?  No matter 
what's asked, the spec has no answers.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:48

Message:
Logged In: YES 
user_id=108973

Ooops, I already wrote the converter (see new patch). I'm 
not very concerned about sending 300 character strings for 
large doubles, but I guess someone might be. I am concerned 
about how large and ugly the code is.

XML-RPC is very poorly specified but the grammar for 
doubles seems reasonably clear (silly, but clear).

If you don't like my double marshalling code, you could 
please just checkin your infinity/NaN detection code (also 
part of my patch)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:13

Message:
Logged In: YES 
user_id=31435

If you think XML-RPC users are keen to see multi-hundred 
character strings produced for ordinary doubles, Python 
isn't going to be much help (you'll have to write your own 
float -> string conversion); or if you think they're happy 
to get an exception if they want to pass (e.g.) 1e20, you 
can keep using repr() and complain because repr(1e20) 
produces an exponent.

"decimal format" is simply two extremely common words 
pasted together <+.9 wink>.  I expect the Python docs here 
ended up so vague because whoever wrote this part of the 
docs didn't know the full story and didn't have time to 
figure it out.

But I expect the same is true of the part of this spec 
dealing with doubles (it doesn't define what it means 
by "double-precision", and then goes on to say stuff that 
doesn't make sense for what C or Java mean by double, or by 
what IEEE-754 means by double precision -- it's off in its 
own world, so if you take it at face value you'll have to 
guess what the world is, and implement it yourself).

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 11:32

Message:
Logged In: YES 
user_id=108973

I think that we should be flexible about the data that we 
accept but rigorous about the data that we generate. So the 
sign should always be send but not required. 

"decimal format" appears in the Python documentation 
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the 
meaning is not widely known.

I parsed it as "not exponential format".

My question was whether the %f Python format specifier 
simply mapped to the C %f format specifier. But, based on 
the output of a simple C program, that does not appear to 
be the case.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 10:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 09:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 09:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 08:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 07:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 23:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Wed Mar 20 23:55:06 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 15:55:06 -0800
Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug
Message-ID: <E16npv8-0001RO-00@usw-sf-web3.sourceforge.net>

Patches item #532180, was opened at 2002-03-19 17:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Fredrik Lundh (effbot)
Summary: fix xmlrpclib float marshalling bug

Initial Comment:
As it stands now, xmlrpclib can send doubles, such as 
1.#INF, that are not part of the XML-RPC standard. 
This patch causes a ValueError to be raised instead.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 18:55

Message:
Logged In: YES 
user_id=31435

Python's internal format buffers are too small to use C %f 
in its full generality, so you're suggesting something 
there that's much harder to get done than you suspect.  
Note that %f isn't a cureall anyway, as in either Python or 
C, e.g., '%f' % 1e-10 throws away all information, 
producing a string of zeroes.  What you did is usually much 
better than that.

Let's wait to hear what /F wants to do.  If he's inclined 
to take this part of the spec at face value, I can work 
with him to write a "conforming" float->string that's 
numerically sound.  Else it's a lot of tedious work for no 
reason.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 18:24

Message:
Logged In: YES 
user_id=108973

OK, this floating point stuff is over my head.

Is it OK that it loses accuracy?  
- No
Is it OK that it produces 16 trailing zeroes for 1e-250?
- Yes
Is it OK that it raises OverflowError for the normal double 
1e-300?  
- No

Would exposing and using the C %f specifier, along with 
repr, make for identical roundtrips?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 17:53

Message:
Logged In: YES 
user_id=31435

I don't use XML-RPC, so I'm assigning this to /F (it was 
his code at the start, and he wants to keep it in synch 
with his company's version).

Formatting floats is a difficult job if you pay attention 
to accuracy.  The original code had the property that 
converting a Python float to an XML-RPC string, then back 
to a float again, reproduced the original input exactly.  
The code in the patch enjoys that property only by 
accident; much of the time a roundtrip conversion using it 
won't reproduce the number that was passed in.  Is that 
OK?  There's no way to tell, since the XML-RPC spec has 
scant idea what it's doing here, so leaves important 
questions unanswered.  OTOH, it seems to me that the 
*point* of this porotocol is to transport values across 
boxes, so of course it should move heaven and earth to 
transport them faithfully.

Is it OK that it loses accuracy?  Is it OK that it produces 
16 trailing zeroes for 1e-250?  Is it OK that it raises 
OverflowError for the normal double 1e-300?  No matter 
what's asked, the spec has no answers.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 15:48

Message:
Logged In: YES 
user_id=108973

Ooops, I already wrote the converter (see new patch). I'm 
not very concerned about sending 300 character strings for 
large doubles, but I guess someone might be. I am concerned 
about how large and ugly the code is.

XML-RPC is very poorly specified but the grammar for 
doubles seems reasonably clear (silly, but clear).

If you don't like my double marshalling code, you could 
please just checkin your infinity/NaN detection code (also 
part of my patch)?


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 15:13

Message:
Logged In: YES 
user_id=31435

If you think XML-RPC users are keen to see multi-hundred 
character strings produced for ordinary doubles, Python 
isn't going to be much help (you'll have to write your own 
float -> string conversion); or if you think they're happy 
to get an exception if they want to pass (e.g.) 1e20, you 
can keep using repr() and complain because repr(1e20) 
produces an exponent.

"decimal format" is simply two extremely common words 
pasted together <+.9 wink>.  I expect the Python docs here 
ended up so vague because whoever wrote this part of the 
docs didn't know the full story and didn't have time to 
figure it out.

But I expect the same is true of the part of this spec 
dealing with doubles (it doesn't define what it means 
by "double-precision", and then goes on to say stuff that 
doesn't make sense for what C or Java mean by double, or by 
what IEEE-754 means by double precision -- it's off in its 
own world, so if you take it at face value you'll have to 
guess what the world is, and implement it yourself).

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 14:32

Message:
Logged In: YES 
user_id=108973

I think that we should be flexible about the data that we 
accept but rigorous about the data that we generate. So the 
sign should always be send but not required. 

"decimal format" appears in the Python documentation 
(http://www.python.org/doc/current/lib/typesseq-
strings.html) so it is probably a documentation bug if the 
meaning is not widely known.

I parsed it as "not exponential format".

My question was whether the %f Python format specifier 
simply mapped to the C %f format specifier. But, based on 
the output of a simple C program, that does not appear to 
be the case.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 14:04

Message:
Logged In: YES 
user_id=31435

Well, Brian, the spec clearly disallows 1.0 too -- if you 
want to take that spec seriously, you can implement what it 
says and we'll redirect the complaints to your personal 
email account <wink>.

I can't parse your question about the C library (like, I 
don't know what you mean by "decimal format").


----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 13:57

Message:
Logged In: YES 
user_id=108973

Whether it was intended or not, the spec clearly disallows 
it. 

I noticed the %f behavior too, which is interesting because 
the Python docs say: 
f Floating point decimal format

I wonder if it is the underlying C library refusing to 
write large float values in decimal format.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:08

Message:
Logged In: YES 
user_id=31435

Ack, I take part of that back:  it's Python's 
implementation of '%f' that can produce exponent notation.  
There's no simple way to get the effect of C's %f from 
Python.  It's clear as mud whether "the spec" *intended* to 
outlaw exponent notation.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:53

Message:
Logged In: YES 
user_id=31435

"%f" can produce exponent notation too, which is also not 
allowed by this pseudo-spec.

r = repr(some_double)
if 'n' in r or 'N' in r:
    raise ValueError(...)

is robust, will work fine x-platform, and isn't insane 
<wink>.

----------------------------------------------------------------------

Comment By: Brian Quinlan (bquinlan)
Date: 2002-03-20 12:31

Message:
Logged In: YES 
user_id=108973

Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling 
and strtod for unmarshalling.

Let me design a more robust patch. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 11:23

Message:
Logged In: YES 
user_id=31435

The spec appears worse than useless to me here -- whoever 
wrote it just made stuff up.  They don't appear to know 
anything about floats or about grammar specification.  Do 
you really want to allow "+." and disallow "1.0"?  This 
seems a case where the spec is so braindead that nobody (in 
their mind <wink>) will implement it as given.  What do 
other implementations do?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 11:03

Message:
Logged In: YES 
user_id=21627

You are right. An even better patch would check for
compliance with the protocol. Currently, the xmlrpc spec says

#  There is no representation for infinity or negative 
# infinity or "not a number". At this time, only decimal
# point notation is allowed, a plus or a minus, followed by
# any number of numeric characters, followed by a period 
# and any number of numeric characters. Whitespace is not 
# allowed. The range of allowable values is 
# implementation-dependent, is not specified.

That would be best validated with a regular expression.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 10:02

Message:
Logged In: YES 
user_id=31435

Note that the patch only catches "the problem" on a 
platform whose C library can't read back its own float 
output.  Windows is in that class, but many other platforms 
aren't.

It would be better to see whether 'n' or 'N' appear in the 
repr() (that would catch variations of 'inf', 'INF', 'NaN' 
and 'IND', while no "normal" float contains n).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 02:28

Message:
Logged In: YES 
user_id=21627

It seems repr of the float is computed twice in every case.
I recommend to save the result of the first computation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 00:36:40 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 16:36:40 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16nqZM-0001sX-00@usw-sf-web3.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 18:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

Comment By: Dale Strickland-Clark (dalesc)
Date: 2002-03-21 00:36

Message:
Logged In: YES 
user_id=457577

Surely Tim's is more an argument for fixing hasattr so it 
doesn't depend on an exception?
To limit meaningful error messages because they slow normal 
program flow screams 'bad design' to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 23:09

Message:
Logged In: YES 
user_id=31435

If it's one cycle slower than it is today when the 
exception is ignored, Zope will notice it (it uses hasattr 
for blood).  Then Guido will get fired, have to pump gas in 
Amsterdam for a living, and we'll never hear from him 
again.  How badly do you want to destroy Python <wink>?

It may be fruitful to hammer out an efficient alternative 
on PythonDev.

It's not an argument about whether more info would be 
useful, although <wink> on c.l.py Dale seemed happy enough 
as soon as someone explained what 'module' was doing in his 
msg.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 21:50

Message:
Logged In: YES 
user_id=44345

hmmm...  How much would I have to modify it to get you
to change your mind?  I'm pretty sure I can get rid of
the call to PyObject_HasAttrString without a lot of
effort.  I can't do much about avoiding at least one
PyObject_GetAttrString call though, which obviously
means you could wind up back in bytecode.

I jumped on this after seeing the request in c.l.py
mostly because I've wanted it from time-to-time as
well.  The extra information is useful at times.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 18:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 01:50:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 17:50:28 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16nrim-0002gb-00@usw-sf-web3.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 12:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 19:50

Message:
Logged In: YES 
user_id=44345

In theory.  Python's getattr capability is so dynamic
though I suspect there's little hasattr() can
do but call getattr() and react to the result.


----------------------------------------------------------------------

Comment By: Dale Strickland-Clark (dalesc)
Date: 2002-03-20 18:36

Message:
Logged In: YES 
user_id=457577

Surely Tim's is more an argument for fixing hasattr so it 
doesn't depend on an exception?
To limit meaningful error messages because they slow normal 
program flow screams 'bad design' to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 17:09

Message:
Logged In: YES 
user_id=31435

If it's one cycle slower than it is today when the 
exception is ignored, Zope will notice it (it uses hasattr 
for blood).  Then Guido will get fired, have to pump gas in 
Amsterdam for a living, and we'll never hear from him 
again.  How badly do you want to destroy Python <wink>?

It may be fruitful to hammer out an efficient alternative 
on PythonDev.

It's not an argument about whether more info would be 
useful, although <wink> on c.l.py Dale seemed happy enough 
as soon as someone explained what 'module' was doing in his 
msg.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 15:50

Message:
Logged In: YES 
user_id=44345

hmmm...  How much would I have to modify it to get you
to change your mind?  I'm pretty sure I can get rid of
the call to PyObject_HasAttrString without a lot of
effort.  I can't do much about avoiding at least one
PyObject_GetAttrString call though, which obviously
means you could wind up back in bytecode.

I jumped on this after seeing the request in c.l.py
mostly because I've wanted it from time-to-time as
well.  The extra information is useful at times.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 12:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From christine@ema.trafficmagnet.net  Thu Mar 21 02:25:09 2002
From: christine@ema.trafficmagnet.net (Christine Hall)
Date: Thu, 21 Mar 2002 10:25:09 +0800 (CST)
Subject: [Patches] http://www.pythonlabs.com
Message-ID: <27CR1000148026@emaserver.trafficmagnet.net>

--1197004328.1016677509125.JavaMail.SYSTEM.emaserver
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

ReplyTo:"Christine Hall"<christine@trafficmagnet.net>
Hi!

Did you know that 85% of your potential customers will be using search engines to find what they are looking for on the Internet? Have you ever thought about getting your website listed on search engines worldwide?

TrafficMagnet offers a unique technology that will submit your website to over 300,000 search engines and directories every month. We can help your customers find you!


Submit your website to more than 300,000 search engines and directories

http://www.pythonlabs.com
Normal Price: $14.95 per month
Special Price: $9.95 per month
You Save: more than 30% off

Sign up today at http://www.trafficmagnet.net


Benefit now!
It's easy, it's affordable and you can sign up online.
I look forward to hearing from you.


Best Regards,

Christine Hall 
Sales and Marketing 
E-mail: christine@trafficmagnet.net 
http://www.TrafficMagnet.net
--1197004328.1016677509125.JavaMail.SYSTEM.emaserver
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css">
<!--
TD { font-family: verdana, arial, helvetica; font-size: 11px; color: #000000 }
A { color: #CC0000; text-decoration: none }
A:hover { color: #FF0000; text-decoration: underline }
A:active,A:visited	{ color: #003399; text-decoration: underline }
.cross {  text-decoration: line-through}
.redbold { font-family: verdana, arial, helvetica; font-size: 12px; color: #660000; font-weight: bold }
-->
</style>
</head>
<base href="http://www.trafficmagnet.net/cmp1/">
<body bgcolor="#FFFFFF">
<form name="form1" method="post" action="">
<table width="552" border="0" cellspacing="0" cellpadding="1">
<tr>
<td bgcolor="#000000">
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#FFFFFF">
<tr> 
<td align="right"><img src="hd_offer1.gif" width="135" height="14"></td>
</tr>
<tr> 
<td><img src="hd_logo.gif" width="299" height="76"><img src="hd_offer2.gif" width="253" height="76"></td>
</tr>
<tr> 
<td> 
<table width="100%" border="0" cellspacing="0" cellpadding="10">
<tr> 
<td><b>Hi!</b><br>
<br>
Did you know that 85% of your potential customers will be using 
search engines to find what they are looking for on the Internet? 
Have you ever thought about getting your website listed on search 
engines worldwide?<br>
<br>
TrafficMagnet offers a unique technology that will submit your 
website to over 300,000 search engines and directories every 
month. We can help your customers find you!</td>
</tr>
<tr> 
<td>
<table width="528" border="0" cellspacing="0" cellpadding="0">
<tr> 
<td colspan="4"><img src="browser_i1.gif" width="528" height="72"></td>
</tr>
<tr> 
<td align="left" bgcolor="#F7B510" valign="bottom" rowspan="2"><img src="browser_i2.gif" width="22" height="161"></td>
<td bgcolor="#F7B510" valign="top" colspan="2"> 
<input type="text" name="textfield" value="http://www.pythonlabs.com" size="70" style='font-size: 11px' maxlength="25">
</td>
<td align="right" bgcolor="#F7B510" valign="bottom" rowspan="2"><img src="browser_i3.gif" width="25" height="161"></td>
</tr>
<tr> 
<td bgcolor="#F7B510" valign="top"> <br>
<table width="90%" border="0" cellspacing="3" cellpadding="0">
<tr> 
<td width="32%">Normal Price:</td>
<td width="19%" class="cross">$14.95</td>
<td width="49%">per month</td>
</tr>
<tr> 
<td width="32%">Special Price:</td>
<td width="19%">$9.95</td>
<td width="49%">per month</td>
</tr>
<tr> 
<td width="32%" class="redbold">You Save:</td>
<td colspan="2" class="redbold">More than <b>30%</b> off</td>
</tr>
<tr> 
<td width="32%">&nbsp;</td>
<td colspan="2"><br>
<a href=http://emaserver.trafficmagnet.net/trafficmagnet/www/r?1000148026.27.1.On9x8Yen6I3vno><img src="button_getstarted.gif" width="157" height="38" border="0" alt="Get Started Today"></a> 
</td>
</tr>
</table>
</td>
<td bgcolor="#F7B510" valign="bottom" align="right"><a href=http://emaserver.trafficmagnet.net/trafficmagnet/www/r?1000148026.27.3.6$0PhKl+dn0qWk><img src="button_learnmore.gif" width="92" height="80" border="0" alt="Learn More"></a></td>
</tr>
<tr> 
<td colspan="4"><img src="browser_i4.gif" width="528" height="25"></td>
</tr>
</table>
</td>
</tr>
<tr> 
<td><b><font size="2" color="#FF0000">Benefit now!</font></b><br>
It's easy, it's affordable and you can sign up online.<br>
I look forward to hearing from you.<br></td>
</tr>
<tr> 
<td>
<br>
Best Regards,<br>
<br>
Christine Hall <br>
Sales and Marketing <br>
<a href="mailto: christine@trafficmagnet.net">E-mail: christine@trafficmagnet.net</a> <br>
<a href= http://emaserver.trafficmagnet.net/trafficmagnet/www/r?1000148026.27.5.lhf1d2KDZB+kTp>http://www.TrafficMagnet.net</a> 
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
<BR>
<font size="1"> This email was sent to patches@python.org. <BR>We understand you may wish NOT to receive information from us by eMail. <BR>To be removed from this and other offers, simply click <a href=http://emaserver.trafficmagnet.net/trafficmagnet/www/optoutredirect?UC=Lead&UI=641673>here</A>.</font>
</form>
<IMG SRC="http://emaserver.trafficmagnet.net/trafficmagnet/www/picture.jsp?UC=Lead&UI=641673&CRID=27CR1000148026" ALT="." HEIGHT=1 WIDTH=1>
</body>
</html>
--1197004328.1016677509125.JavaMail.SYSTEM.emaserver--


From noreply@sourceforge.net  Thu Mar 21 02:25:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 20 Mar 2002 18:25:11 -0800
Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting
Message-ID: <E16nsGN-00033Y-00@usw-sf-web3.sourceforge.net>

Patches item #532638, was opened at 2002-03-20 13:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better AttributeError formatting

Initial Comment:
A user in c.l.py was confused when

  import m
  m.a

reported

  AttributeError: 'module' object has no attribute 
'a'

The attached patch displays the object's name in
the error message if it has a __name__ attribute.
This is a bit tricky because of the recursive 
nature of looking up an attribute during a getattr 
operation. My solution was to pull the error 
formatting code into a separate static routine
(the same basic thing happens in three places) and
define a static variable there that breaks any 
recursion.

While this might not be thread-safe, I
think it's okay in this situation.  The worst that 
should happen is you get either an extra round of
recursion while looking up a non-existent __name__
ttribute or fail to even check for __name__ and
use the default formatting when the object
actually has a __name__ attribute.  This can only
happen if you have two threads who both get 
attribute errors at the same time, and then only
if the process of looking things up takes you back
into Python code.

Perhaps a similar technique can be provided for 
other error formatting operations in object.c.

Example for objects with and without __name__
attributes:

>>> "".foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: str object has no attribute 'foo'
>>> import string
>>> string.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: module object 'string' has no 
attribute 'foo'

Skip


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-20 21:25

Message:
Logged In: YES 
user_id=31435

hasattr() is defined in terms of whether PyObject_GetAttr() 
raises an exception, and thanks to __getattr__ hooks can't 
be computed any faster than calling PyObject_GetAttr().  
Which is what the code does:

	v = PyObject_GetAttr(v, name);
	if (v == NULL) {
		PyErr_Clear();
		Py_INCREF(Py_False);
		return Py_False;
	}
	Py_DECREF(v);
	Py_INCREF(Py_True);
	return Py_True;

It's simply not going to get faster than that.

I'm not saying you can't have a "better" message here 
(although since an object's __name__ field doesn't bear any 
necessary relationship to the variable name(s) through 
which the object is referenced, it's unclear that the 
message won't actually be worse in real non-trivial cases:  
the type name is an object invariant, but the name can be 
misleading).  I am saying the tradeoff is real and needs to 
be addressed.  That's part of "good design", Dale; doing 
what feels good in the last case you remember is arguably 
not.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 20:50

Message:
Logged In: YES 
user_id=44345

In theory.  Python's getattr capability is so dynamic
though I suspect there's little hasattr() can
do but call getattr() and react to the result.


----------------------------------------------------------------------

Comment By: Dale Strickland-Clark (dalesc)
Date: 2002-03-20 19:36

Message:
Logged In: YES 
user_id=457577

Surely Tim's is more an argument for fixing hasattr so it 
doesn't depend on an exception?
To limit meaningful error messages because they slow normal 
program flow screams 'bad design' to me.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 18:09

Message:
Logged In: YES 
user_id=31435

If it's one cycle slower than it is today when the 
exception is ignored, Zope will notice it (it uses hasattr 
for blood).  Then Guido will get fired, have to pump gas in 
Amsterdam for a living, and we'll never hear from him 
again.  How badly do you want to destroy Python <wink>?

It may be fruitful to hammer out an efficient alternative 
on PythonDev.

It's not an argument about whether more info would be 
useful, although <wink> on c.l.py Dale seemed happy enough 
as soon as someone explained what 'module' was doing in his 
msg.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-03-20 16:50

Message:
Logged In: YES 
user_id=44345

hmmm...  How much would I have to modify it to get you
to change your mind?  I'm pretty sure I can get rid of
the call to PyObject_HasAttrString without a lot of
effort.  I can't do much about avoiding at least one
PyObject_GetAttrString call though, which obviously
means you could wind up back in bytecode.

I jumped on this after seeing the request in c.l.py
mostly because I've wanted it from time-to-time as
well.  The extra information is useful at times.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-20 13:56

Message:
Logged In: YES 
user_id=31435

I'm -1 on this because of the expense:  many apps routinely 
provoke AttributeErrors that are deliberately ignored.  All 
the time that goes into making nice messages is wasted 
then.  A "lazy" exception object that produced a string 
only when actually needed would be fine (although perhaps 
an object may manage to change its computed __name__ by 
then!).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 10:25:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 02:25:20 -0800
Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation
Message-ID: <E16nzl2-0007Mq-00@usw-sf-web1.sourceforge.net>

Patches item #526840, was opened at 2002-03-07 09:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PEP 263 Implementation

Initial Comment:
The attached patch implements PEP 263. The following
differences to the PEP (rev. 1.8) are known:

- The implementation interprets "ASCII compatible" as
meaning "bytes below 128 always denote ASCII
characters", although this property is only used for
",', and \. There have been other readings of "ASCII
compatible", so this should probably be elaborated in
the PEP.

- The check whether all bytes follow the declared or
system encoding (including comments and string
literals) is only performed if the encoding is "ascii".


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-21 11:25

Message:
Logged In: YES 
user_id=21627

Version 2 of this patch implements revision 1.11 of the PEP
(phase 1). The check of the complete source file for
compliance with the declared encoding is implemented by
decoding the input line-by-line; I believe that for all
supported encodings, this is not different compared to
decoding the entire source file at once.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:24

Message:
Logged In: YES 
user_id=21627

Changing the decoding functions will not result in one
additional function, but in two of them: you'll also get
PyUnicode_DecodeRawUnicodeEscapeFromUnicode.

That seems quite unmaintainable to me: any change now needs
to propagate into four functions. OTOH, I don't think that
the code that allows parsing a variable-sized strings is
overly complicated.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 19:01

Message:
Logged In: YES 
user_id=38388

Ok, I've had a look at the patch. 

It looks good except for the overly 
complicated implementation of the 
unicode-escape codec. 

Even though there's a bit of code duplication, 
I'd prefer to have two separate functions here: 
one for the standard char* pointer type and 
another one for Py_UNICODE*, ie.

PyUnicode_DecodeUnicodeEscape(char*...)
and
PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...)

This is easier to support and gives better
performance since the compiler can optimize
the two functions making different 
assumptions.

You'll also need to include a name mangling
at the top of the header for the new API.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 15:06

Message:
Logged In: YES 
user_id=6380

I've set the group to Python 2.3 so the priority has some
context (I'd rather you move the priority down to 5 but I
understand this is your personal priority).

I haven't accepted the PEP yet (although I expect I will),
so please don't check this in yet (if you feel it needs to
be saved in CVS, use a branch).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-03-07 12:06

Message:
Logged In: YES 
user_id=38388

Thank you !

I'll add a note to the PEP about the way the first two lines
are processed (removing the ASCII mention...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 10:11

Message:
Logged In: YES 
user_id=21627

A note on the implementation strategy: it turned out that
communicating the encoding into the abstract syntax was the
biggest challenge. 

To solve this, I introduced encoding_decl pseudo node: it is
an unused non-terminal whose STR() is the encoding, and
whose only child is the true root of the syntax tree. As
such, it is the only non-terminal which has a STR value.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 10:40:09 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 02:40:09 -0800
Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance
Message-ID: <E16nzzN-0002AJ-00@usw-sf-web2.sourceforge.net>

Patches item #504943, was opened at 2002-01-17 17:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Walter Dörwald (doerwalter)
Summary: call warnings.warn with Warning instance

Initial Comment:
This patch makes it possible to pass Warning instances 
as the first argument to warnings.warn. In this case 
the category argument will be ignored. The message
text used will be str(warninginstance). This makes it 
possible to implement special logic in a custom 
Warning class by implemening the __str__ method.


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-21 11:40

Message:
Logged In: YES 
user_id=89016

Checked in as:
Lib/warnings.py 1.10
Doc/lib/libwarnings.tex 1.8


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-20 19:21

Message:
Logged In: YES 
user_id=6380

Looks OK. Give it a try.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-20 18:26

Message:
Logged In: YES 
user_id=89016

Now that I have write access can I check this in?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-18 16:46

Message:
Logged In: YES 
user_id=89016

The new version includes a patch to the documentation and 
an entry in Misc/NEWS

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-18 14:45

Message:
Logged In: YES 
user_id=6380

Nice idea. Where's the documentation patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 11:08:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 03:08:03 -0800
Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen()
Message-ID: <E16o0QN-0002UT-00@usw-sf-web2.sourceforge.net>

Patches item #523415, was opened at 2002-02-27 14:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Gimblett (gimbo)
>Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Explict proxies for urllib.urlopen()

Initial Comment:
This patch extends urllib.urlopen() so that
proxies may be specified explicitly.  This is
achieved by adding an optional "proxies"
parameter.  If this parameter is omitted,
urlopen() acts exactly as before, ie gets
proxy settings from the environment.

This is useful if you want to tell urlopen()
not to use the proxy: just pass an empty
dictionary.

Also included is a patch to the urllib
documentation explaining the new parameter.

Apologies if patch format is not exactly as
required: this is my first submission.  All
feedback appreciated.  :-)


----------------------------------------------------------------------

>Comment By: Andy Gimblett (gimbo)
Date: 2002-03-21 11:08

Message:
Logged In: YES 
user_id=262849

OK, have updated docs as suggested by aimacintyre,
attached as urllib_proxies_docs.cdiff

I also added an example for explicit proxy
specification, since it illustrates how the proxies
dictionary should be structured.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-10 05:31

Message:
Logged In: YES 
user_id=250749

I think expanding the docs is the go here.

In looking at the 2.2 docs (11.4 urllib), the bits that I think could usefully be improved include:-
- the paragraph describing the proxy environment variables should note that on Windows,
  browser (at least for InternetExplorer - I don't know about Netscape) registry settings for proxies
  will be used when available;
- a short para noting that proxies can be overridden using URLopener/FancyURLopener 
  class instances, documented further down the page, placed just before the note about 
  not supporting authenticating proxies;
- adding a description of the "proxies" parameter to the URLopener class definition;
- adding an example of bypassing proxies to the examples subsection (11.4.2).

If/when you upload a doc patch, I suggest that you assign it to Fred Drake, who is the 
chief docs person.

----------------------------------------------------------------------

Comment By: Andy Gimblett (gimbo)
Date: 2002-03-04 09:33

Message:
Logged In: YES 
user_id=262849

Thanks for feedback re: diffs.  Have now found out
about context diffs and attached new version - hope
this is better.

Regarding the patch itself, this arose out of a newbie
question on c.l.py and I was reminded that this was an
issue I'd come across in my early days too.  Personally
I'd never picked up the hint that you should use
FancyURLopener directly.

If preferred, I could have a go at patching the docs
to make that clearer?


----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 03:34

Message:
Logged In: YES 
user_id=250749

BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-03-03 03:32

Message:
Logged In: YES 
user_id=250749

Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the 
docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) 
directly.  If this is the intent, the docs could be a little clearer about this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 11:09:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 03:09:37 -0800
Subject: [Patches] [ python-Patches-533008 ] specifying headers for extensions
Message-ID: <E16o0Rt-0007ra-00@usw-sf-web1.sourceforge.net>

Patches item #533008, was opened at 2002-03-21 12:09
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470

Category: Distutils and setup.py
Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Thomas Heller (theller)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: specifying headers for extensions

Initial Comment:
This patch allows to specify that C header files are 
part of source files for dependency checking. 
The 'sources' list in Extension instances can be 
simple filenames as before, but they can also be 
SourceFile instances created by

SourceFile("myfile.c", headers=["inc1.h", "inc2.h"]).

Unfortunately not only changes to command.build_ext 
and command.build_clib had to be made, also all the 
ccompiler (sub)classes have to be changed because the 
ccompiler does the actual dependency checking. I 
updated all the ccompiler subclasses except 
mwerkscompiler.py, but only msvccompiler has actually 
been tested.

The argument list which dep_util.newer_pairwise() now 
accepts has changed, the first arg must now be a 
sequence of SourceFile instances. This may be 
problematic, better would IMO be to move this function 
(with a new name?) into ccompiler.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 13:25:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 05:25:10 -0800
Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings.
Message-ID: <E16o2Z4-0004c0-00@usw-sf-web4.sourceforge.net>

Patches item #533070, was opened at 2002-03-21 13:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ralph Corderoy (ralph)
Assigned to: Nobody/Anonymous (nobody)
Summary: Silence AIX C Compiler Warnings.

Initial Comment:
AIX 3.2.5 C compiler gives warnings during compile of 
Objects/object.c and Modules/signalmodule.c due to
superfluous use of the ampersand address operator in 
front of a function name.  Since the code elsewhere 
consistently uses plain `foo' to represent a pointer to
the function foo and not `&foo' it seems best to make
the code consistent and silence these warnings at the
same time.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 13:29:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 05:29:31 -0800
Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings.
Message-ID: <E16o2dH-0004hP-00@usw-sf-web4.sourceforge.net>

Patches item #533070, was opened at 2002-03-21 13:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
>Priority: 3
Submitted By: Ralph Corderoy (ralph)
>Assigned to: Michael Hudson (mwh)
Summary: Silence AIX C Compiler Warnings.

Initial Comment:
AIX 3.2.5 C compiler gives warnings during compile of 
Objects/object.c and Modules/signalmodule.c due to
superfluous use of the ampersand address operator in 
front of a function name.  Since the code elsewhere 
consistently uses plain `foo' to represent a pointer to
the function foo and not `&foo' it seems best to make
the code consistent and silence these warnings at the
same time.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 15:13:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 07:13:08 -0800
Subject: [Patches] [ python-Patches-532729 ] build (link) fails on Solaris 8-sem_init
Message-ID: <E16o4FY-0005WQ-00@usw-sf-web2.sourceforge.net>

Patches item #532729, was opened at 2002-03-20 22:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470

Category: Build
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Martin v. Löwis (loewis)
Summary: build (link) fails on Solaris 8-sem_init

Initial Comment:
The build fails on Solaris 8 because sem_init() is in -lrt.

Attached is a patch which works.  Actually, there will
be 3 patches.  1 to configure.in, 1 to configure which
has many changes (my autoconf must be different than
whoever generates configure normally) and a minimal
configure diff.

Probably would be best to have the correct person
generate a new configure.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-21 16:13

Message:
Logged In: YES 
user_id=21627

Committed as configure 1.289; configure.in 1.299;
pyconfig.h.in 1.24.

Python uses currently autoconf 2.13; 2.12 should also work.
autoconf 2.50 is a quite different beast - even though the
resulting configure should work fine, it has many macros
changed and thus results in huge differences to the CVS
configure.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 15:48:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 07:48:55 -0800
Subject: [Patches] [ python-Patches-532729 ] build (link) fails on Solaris 8-sem_init
Message-ID: <E16o4oB-0003Nm-00@usw-sf-web3.sourceforge.net>

Patches item #532729, was opened at 2002-03-20 16:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470

Category: Build
Group: Python 2.3
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Martin v. Löwis (loewis)
Summary: build (link) fails on Solaris 8-sem_init

Initial Comment:
The build fails on Solaris 8 because sem_init() is in -lrt.

Attached is a patch which works.  Actually, there will
be 3 patches.  1 to configure.in, 1 to configure which
has many changes (my autoconf must be different than
whoever generates configure normally) and a minimal
configure diff.

Probably would be best to have the correct person
generate a new configure.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-21 10:48

Message:
Logged In: YES 
user_id=33168

That's odd:
autoconf --version
Autoconf version 2.13

uname -a
Linux epoch 2.4.7-10 #1 Thu Sep 6 16:46:36 EDT 2001 i686 unknown

Oh well.  Thanks Martin.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-21 10:13

Message:
Logged In: YES 
user_id=21627

Committed as configure 1.289; configure.in 1.299;
pyconfig.h.in 1.24.

Python uses currently autoconf 2.13; 2.12 should also work.
autoconf 2.50 is a quite different beast - even though the
resulting configure should work fine, it has many macros
changed and thus results in huge differences to the CVS
configure.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 16:40:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 08:40:49 -0800
Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8
Message-ID: <E16o5cP-00040w-00@usw-sf-web3.sourceforge.net>

Patches item #533165, was opened at 2002-03-21 11:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470

Category: Tests
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: add expected test failures on solaris 8

Initial Comment:
This patch makes the following skipped tests expected
on sunos5:

test_al test_bsddb test_cd test_cl test_gl 
test_imgfile test_linuxaudiodev test_nis
test_openpty test_winreg test_winsound

I'll try to fix the problem that sunos5 should really
be something like sunos5.6, 5.7, 5.8, etc.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 17:17:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 09:17:16 -0800
Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings.
Message-ID: <E16o6Bg-0004Sv-00@usw-sf-web3.sourceforge.net>

Patches item #533070, was opened at 2002-03-21 08:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 3
Submitted By: Ralph Corderoy (ralph)
Assigned to: Michael Hudson (mwh)
Summary: Silence AIX C Compiler Warnings.

Initial Comment:
AIX 3.2.5 C compiler gives warnings during compile of 
Objects/object.c and Modules/signalmodule.c due to
superfluous use of the ampersand address operator in 
front of a function name.  Since the code elsewhere 
consistently uses plain `foo' to represent a pointer to
the function foo and not `&foo' it seems best to make
the code consistent and silence these warnings at the
same time.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-21 12:17

Message:
Logged In: YES 
user_id=33168

Builds for me without warnings on Linux gcc 2.96 & 
solaris 8, gcc 2.95.3.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 18:17:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 10:17:44 -0800
Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings.
Message-ID: <E16o78C-0005Br-00@usw-sf-web3.sourceforge.net>

Patches item #533070, was opened at 2002-03-21 13:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 3
Submitted By: Ralph Corderoy (ralph)
Assigned to: Michael Hudson (mwh)
Summary: Silence AIX C Compiler Warnings.

Initial Comment:
AIX 3.2.5 C compiler gives warnings during compile of 
Objects/object.c and Modules/signalmodule.c due to
superfluous use of the ampersand address operator in 
front of a function name.  Since the code elsewhere 
consistently uses plain `foo' to represent a pointer to
the function foo and not `&foo' it seems best to make
the code consistent and silence these warnings at the
same time.


----------------------------------------------------------------------

>Comment By: Ralph Corderoy (ralph)
Date: 2002-03-21 18:17

Message:
Logged In: YES 
user_id=911

Dear nnorwitz, I'm aware that gcc doesn't issue the warning.
However, AIX 3.2.5's C compiler does.  And the source
consistently omits the ampersand elsewhere.  So there seems
little reason not to make the change and increase the
number of `clean' builds out there.  Cheers, Ralph.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-21 17:17

Message:
Logged In: YES 
user_id=33168

Builds for me without warnings on Linux gcc 2.96 & 
solaris 8, gcc 2.95.3.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470


From noreply@sourceforge.net  Thu Mar 21 18:25:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 10:25:22 -0800
Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings.
Message-ID: <E16o7Fa-0004hl-00@usw-sf-web1.sourceforge.net>

Patches item #533070, was opened at 2002-03-21 08:25
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 3
Submitted By: Ralph Corderoy (ralph)
Assigned to: Michael Hudson (mwh)
Summary: Silence AIX C Compiler Warnings.

Initial Comment:
AIX 3.2.5 C compiler gives warnings during compile of 
Objects/object.c and Modules/signalmodule.c due to
superfluous use of the ampersand address operator in 
front of a function name.  Since the code elsewhere 
consistently uses plain `foo' to represent a pointer to
the function foo and not `&foo' it seems best to make
the code consistent and silence these warnings at the
same time.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-21 13:25

Message:
Logged In: YES 
user_id=33168

I'm sorry, you misunderstood.  I agree that the patch should
be applied.  I was reporting that there were no problems
created on other platforms.  -- Neal

----------------------------------------------------------------------

Comment By: Ralph Corderoy (ralph)
Date: 2002-03-21 13:17

Message:
Logged In: YES 
user_id=911

Dear nnorwitz, I'm aware that gcc doesn't issue the warning.
However, AIX 3.2.5's C compiler does.  And the source
consistently omits the ampersand elsewhere.  So there seems
little reason not to make the change and increase the
number of `clean' builds out there.  Cheers, Ralph.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-21 12:17

Message:
Logged In: YES 
user_id=33168

Builds for me without warnings on Linux gcc 2.96 & 
solaris 8, gcc 2.95.3.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 06:59:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 21 Mar 2002 22:59:07 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16oJ11-0004Wr-00@usw-sf-web1.sourceforge.net>

Patches item #530556, was opened at 2002-03-15 19:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Neil Schemenauer (nascheme)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-22 01:59

Message:
Logged In: YES 
user_id=31435

Neil, I'm in favor of forcing this issue:  check it in now, 
while we're still far from the first 2.3 alpha.  People 
will gripe, but that will give them the motivation to help 
too.  It's not going to go anywhere if we wait for all 
answers to all issues in advance (it's been in that limbo 
state for a couple years already ...).  Note that I already 
made pymalloc the default on Windows.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 18:23

Message:
Logged In: YES 
user_id=35752

Oops, forgot one important change in the last update.
PyObject_MALLOC needs to use PyMem_MALLOC not 
_PyMalloc_MALLOC. Clear as mud, no? :-)

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 18:08

Message:
Logged In: YES 
user_id=35752

Update patch to latest CVS.  It's now about 1/3 of its
original size.  We still need documentation for
PyMalloc_{New,NewVar,Del}.

Other than the docs, the only thing left to do is decide if we
want the new API.  The situation with extension modules is
not as bad as I originally thought.  The xxmodule.c example
has been correct since version 1.6.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 14:32

Message:
Logged In: YES 
user_id=31435

I certainly want, e.g., that our Unicode implementation can 
choose to use obmalloc.c for its raw string storage, 
despite that it isn't "object storage" (in the sense of 
Vladimir's level "+2" in the diagram at the top of 
obmalloc.c; the current CVS code restricts obmalloc use to 
level +2, while raw string storage is at level "+1").

Allowing to use pymalloc at level +1 changes Vladimir's 
original intent, and we have no experience with it, so I'm 
fine with restricting that ability to the core at the start.

About names, we've been calling this package "pymalloc" for 
years, and the general form of external name throughout 
Python is

    ["_"] "Py" Package "_" Function

_PyMalloc_{Malloc, Free, etc} fit that pattern perfectly.  
I don't see the attraction to giving functions from this 
package idiosyncratic names, and we've got so many ways to 
spell "get memory" that I expect it will be a genuine help 
to keep on making it clear, from the name alone, to 
which "family" a given variant of "new" (etc) belongs.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-17 12:11

Message:
Logged In: YES 
user_id=35752

I'm not sure exactly what Tim meant by that comment.  If we
want to make PyMalloc available to EXTENSION modules then,
yes, we need to remove the leading underscope and make a
wrapper for it.  I would prefer to keep it private for
now since it gives us more freedom on how PyMalloc_New
is implemented.  Tim?

Regarding the names, I have no problem with Py_Malloc.  If
we change should we keep PyMalloc_{New,NewVar,Del}?  Py_New
seems at little to short.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 05:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-15 22:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 19:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 08:04:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 00:04:39 -0800
Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip)
Message-ID: <E16oK2R-0005pr-00@usw-sf-web3.sourceforge.net>

Patches item #533482, was opened at 2002-03-22 03:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Todd Warner (icode)
Assigned to: Nobody/Anonymous (nobody)
Summary: small seek tweak upon reads (gzip)

Initial Comment:
Upon actual read of a gzipped file, there is a check
to see if you are already at the end of the file. This
is done by saving your position, seeking to the end,
and comparing that tell(). It is more efficient to
simply increment position + 1.

Efficiency gain is nearly insignificant, but this
patch will greatly decrease the size of my next one. :)

NOTE: all version of gzip.py do this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470


From groupagent@myegroups.com  Fri Mar 22 04:05:33 2002
From: groupagent@myegroups.com (My eGroups Agent)
Date: Fri, 22 Mar 2002 04:05:33 (GMT)
Subject: [Patches] It's About Time
Message-ID: <E16oLdQ-0003Q9-00@mail.python.org>

<html>
<head>
<title>Automated Message</title>
<style>
 body { font-size: 12pt; color: #000080; font-family: Comic Sans=
 MS }
a     { color: #1E90FF; font-family: Arial; text-decoration:=
 bold; font-weight: bold }
A:hover { COLOR: navy; TEXT-DECORATION: underline }
A:active { TEXT-DECORATION: underline }

</style>
</head>
<body>

We have some exciting news!  <a=
 href=3D"http://www.myegroups.com">My eGroups</a> is a new service=
 that provides a great new way to organize your groups online!=
 The concept is to provide online tools to organize any group,=
 including the following: 
<ul><strong>
<li>Friends</li>
<li>Social clubs</li>
<li>Kids playgroups</li>
<li>Community groups</li>
<li>Sports clubs</li>
<li>Golf buddies</li>
<li>Neighborhood Associations</li>
</strong></ul>
<font color=3D"#FF0000">Any group that needs to share information,=
 schedule events, post discussion items, take a group vote, or=
 keep an updated contact list is perfect for</font> <a=
 href=3D"http://www.myegroups.com">My eGroups!</a> It is extremely=
 flexible and designed to allow any type of group to <font=
 color=3D"#FF0000">collaborate and organize on-line!</font>  It is=
 a great service that you will get a lot of value out of when you=
 <a href=3D"http://www.myegroups.com">become a member!</a><p>

Here is a link where you can find additional information =96 <a=
 href=3D"http://www.myegroups.com">http://www.myegroups.com</a>. =
 Visit the site and see what it is all about!  When you sign up,=
 please use Promotion Code - <strong>00MM2</strong>! <p>

  If you have any questions about the site or ideas for=
 improvements, please let us know.  You can <a=
 href=3D"http://www.myegroups.com/members/feedback.asp">submit=
 feedback</a> directly from the site!<p>

We know you are <font color=3D"#FF0000">busy</font>...that is why=
 we created My eGroups.<P align=3Dleft><A=
 href=3D"http://www.myegroups.com"><IMG height=3D78=
 src=3D"http://www.myegroups.com/images/MyEGroupLogosm.gif"=
 width=3D289 border=3D0></A></P>
My eGroups...It's about PEOPLE...It's about TIME!

<small>
<p align=3D"center">
You have received this message because of your online=
 affiliations and indications that 
you may be interested in new online services .   If this message=
 has reached you in error 
or you no longer wish to receive these email promotions please 
 <a href=3D"http://www.myegroups.com/OptOut.asp">click here</a> 
to unsubscribe. </p>
<p align=3D"center">
You  may also reply to this message with the word=
 &quot;Remove&quot; in the subject 
field.</p></small>
</body></html>


From noreply@sourceforge.net  Fri Mar 22 15:20:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 07:20:36 -0800
Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks
Message-ID: <E16oQqK-0004uR-00@usw-sf-web2.sourceforge.net>

Patches item #533621, was opened at 2002-03-22 15:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Tim Peters (tim_one)
Summary: Remove pymalloc hooks

Initial Comment:
Just to make sure Vladimir hates me. :-)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:10:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:10:41 -0800
Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc
Message-ID: <E16oSYr-0006cy-00@usw-sf-web4.sourceforge.net>

Patches item #530556, was opened at 2002-03-16 00:01
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Neil Schemenauer (nascheme)
Summary: Enable pymalloc 

Initial Comment:
The attached patch removes the PyCore_* memory
management layer and gives up on the hope that
PyObject_DEL() will ever be anything but free().

pymalloc is given a visible API in the form of
PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free.  
A new object memory interface is implemented
on top of pymalloc in the form of
PyMalloc_{New,NewVar,Del}.  Those are ugly names.
Please suggest alternatives.

Some objects are changed to use pymalloc.  The
GC memory functions are changed to use pymalloc.

The configure support for enabling pymalloc was 
also removed.  Perhaps that should be left in so
people can disable pymalloc on low memory machines.

I left typeobject using the system allocator (new style
classes will not use pymalloc).  Fixing that is
probably a job for Guido. 


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 17:10

Message:
Logged In: YES 
user_id=35752

A slightly modified version of this patch has been checked
in.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 06:59

Message:
Logged In: YES 
user_id=31435

Neil, I'm in favor of forcing this issue:  check it in now, 
while we're still far from the first 2.3 alpha.  People 
will gripe, but that will give them the motivation to help 
too.  It's not going to go anywhere if we wait for all 
answers to all issues in advance (it's been in that limbo 
state for a couple years already ...).  Note that I already 
made pymalloc the default on Windows.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 23:23

Message:
Logged In: YES 
user_id=35752

Oops, forgot one important change in the last update.
PyObject_MALLOC needs to use PyMem_MALLOC not 
_PyMalloc_MALLOC. Clear as mud, no? :-)

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-18 23:08

Message:
Logged In: YES 
user_id=35752

Update patch to latest CVS.  It's now about 1/3 of its
original size.  We still need documentation for
PyMalloc_{New,NewVar,Del}.

Other than the docs, the only thing left to do is decide if we
want the new API.  The situation with extension modules is
not as bad as I originally thought.  The xxmodule.c example
has been correct since version 1.6.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-17 19:32

Message:
Logged In: YES 
user_id=31435

I certainly want, e.g., that our Unicode implementation can 
choose to use obmalloc.c for its raw string storage, 
despite that it isn't "object storage" (in the sense of 
Vladimir's level "+2" in the diagram at the top of 
obmalloc.c; the current CVS code restricts obmalloc use to 
level +2, while raw string storage is at level "+1").

Allowing to use pymalloc at level +1 changes Vladimir's 
original intent, and we have no experience with it, so I'm 
fine with restricting that ability to the core at the start.

About names, we've been calling this package "pymalloc" for 
years, and the general form of external name throughout 
Python is

    ["_"] "Py" Package "_" Function

_PyMalloc_{Malloc, Free, etc} fit that pattern perfectly.  
I don't see the attraction to giving functions from this 
package idiosyncratic names, and we've got so many ways to 
spell "get memory" that I expect it will be a genuine help 
to keep on making it clear, from the name alone, to 
which "family" a given variant of "new" (etc) belongs.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-17 17:11

Message:
Logged In: YES 
user_id=35752

I'm not sure exactly what Tim meant by that comment.  If we
want to make PyMalloc available to EXTENSION modules then,
yes, we need to remove the leading underscope and make a
wrapper for it.  I would prefer to keep it private for
now since it gives us more freedom on how PyMalloc_New
is implemented.  Tim?

Regarding the names, I have no problem with Py_Malloc.  If
we change should we keep PyMalloc_{New,NewVar,Del}?  Py_New
seems at little to short.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-17 10:12

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it does not meet one of
Tim's requirements: there is no way to spell "give me memory
from the allocator that PyMalloc_New uses". _PyMalloc_Malloc
is clearly not for general use, since it starts with an
underscore.

What about calling this allocator (which could be either
PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free?

Also, it appears that there is no function wrapper around
this allocator: A module that uses the PyMalloc allocator
will break in a configuration where pymalloc is disabled.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-16 03:50

Message:
Logged In: YES 
user_id=35752

Okay, with-pymalloc is back but defaults to enabled.  The
functions PyMalloc_{Malloc,Realloc,Free} have been renamed
to _PyMalloc_{Malloc,Realloc,Free}.  Maybe their ugly names
will discourage their use.  People should use
PyMalloc_{New,NewVar,Del} if they want to allocate objects
using pymalloc.

There's no way we can reuse PyObject_{New,NewVar,Del}.
Memory can be allocated with PyObject_New and freed with
PyObject_DEL.  That would not work if PyObject_New used
pymalloc.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-16 00:54

Message:
Logged In: YES 
user_id=21627

-1. --with-pymalloc should remain an option; there is still
the heuristics in releasing memory that may people make
uncomfortable. Also, on systems with super-efficient malloc,
you may not want to use pymalloc.

I dislike the name PyMalloc_Malloc; it may be acceptable for
the allocation algorithm itself (although it sounds funny).
However, for the PyObject allocator, something else needs to
be found.

I can't really see the problem with calling it
PyObject_New/_NewVar/_Del. None of these where available in
Python 1.5.2, so I don't think 1.5.2 code could break.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:20:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:20:48 -0800
Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin
Message-ID: <E16oSie-0003DT-00@usw-sf-web1.sourceforge.net>

Patches item #533681, was opened at 2002-03-22 17:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Apply semaphore code to Cygwin

Initial Comment:
The current version of Cygwin does not define 
_POSIX_SEMAPHORES by default, although requires the 
new semaphore interface since its condition variables 
interface contains a race condition.

This patch simply specifies that semaphores should be 
used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:26:11 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:26:11 -0800
Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8
Message-ID: <E16oSnr-0006LV-00@usw-sf-web2.sourceforge.net>

Patches item #533165, was opened at 2002-03-21 17:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470

Category: Tests
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: add expected test failures on solaris 8

Initial Comment:
This patch makes the following skipped tests expected
on sunos5:

test_al test_bsddb test_cd test_cl test_gl 
test_imgfile test_linuxaudiodev test_nis
test_openpty test_winreg test_winsound

I'll try to fix the problem that sunos5 should really
be something like sunos5.6, 5.7, 5.8, etc.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 18:26

Message:
Logged In: YES 
user_id=21627

-1. The list of skipped modules will vary widely across
installations, even if you take Solaris versions into account.

For example, test_nis will pass for many users, since NIS is
really common in Solaris environments. Likewise, bsddb tests
will pass if bsddb is installed in /usr/local.

OTOH, test_sunaudiodev is known to fail on server systems
which don't have a /dev/audio.

Instead, I would like to see a more flexible scheme for
expected skips, which includes detection that some resources
are unavailable - if that is the cause, the skipped test
does not indicate a problem with the Python installation.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:40:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:40:29 -0800
Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks
Message-ID: <E16oT1h-0006WI-00@usw-sf-web2.sourceforge.net>

Patches item #533621, was opened at 2002-03-22 10:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470

Category: Core (C code)
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
>Assigned to: Neil Schemenauer (nascheme)
Summary: Remove pymalloc hooks

Initial Comment:
Just to make sure Vladimir hates me. :-)

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-22 12:40

Message:
Logged In: YES 
user_id=31435

Well, I hate you too, but it's still a good idea <wink>.  
Accepted & back to you.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:48:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:48:10 -0800
Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8
Message-ID: <E16oT98-00041A-00@usw-sf-web3.sourceforge.net>

Patches item #533165, was opened at 2002-03-21 11:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470

Category: Tests
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: add expected test failures on solaris 8

Initial Comment:
This patch makes the following skipped tests expected
on sunos5:

test_al test_bsddb test_cd test_cl test_gl 
test_imgfile test_linuxaudiodev test_nis
test_openpty test_winreg test_winsound

I'll try to fix the problem that sunos5 should really
be something like sunos5.6, 5.7, 5.8, etc.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-22 12:48

Message:
Logged In: YES 
user_id=33168

I agree that this the skipped test is inadequate, but
this is more of a general problem.  I actually used linux2
as the template.  But only applied the tests which really
failed on the sun.  linux2 also adds that curses,
socket_ssl, socketserver to the list, even those these are
probably successful with the -u curses -u network flags.

I could certainly pare down the list to not include nis, bsddb.
Are you saying that you would like new code added to
regrtest.py to handle TestUnavailable or something like
that?  So if NIS is not available it would raise this
exception.  This is probably a good idea, but would mean a
bunch of tests being modified and mostly getting rid of the
current known skipped list.

If you want to head down this route, I suggest closing this
patch.  We can start a discussion on python-dev or at least
ask if anyone has a problem with the approach.

Also, we should fix the problem you noted before that sunos5
is not sufficient.  We need to be more fine grained, ie,
5.6, 5.7, 5.8.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 12:26

Message:
Logged In: YES 
user_id=21627

-1. The list of skipped modules will vary widely across
installations, even if you take Solaris versions into account.

For example, test_nis will pass for many users, since NIS is
really common in Solaris environments. Likewise, bsddb tests
will pass if bsddb is installed in /usr/local.

OTOH, test_sunaudiodev is known to fail on server systems
which don't have a /dev/audio.

Instead, I would like to see a more flexible scheme for
expected skips, which includes detection that some resources
are unavailable - if that is the cause, the skipped test
does not indicate a problem with the Python installation.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:55:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:55:21 -0800
Subject: [Patches] [ python-Patches-403679 ] AIX and BeOS build quirk revisions
Message-ID: <E16oTG5-0007AP-00@usw-sf-web4.sourceforge.net>

Patches item #403679, was opened at 2001-02-08 08:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470

Category: Build
Group: None
Status: Closed
Resolution: None
Priority: 5
Submitted By: Donn Cave (donnc)
Assigned to: Nobody/Anonymous (nobody)
Summary: AIX and BeOS build quirk revisions

Initial Comment:
This obsoletes #103487.  It deals with scripts like ld_so_aix.  Please move
the scripts in the BeOS subdirectory to Modules:  $ mv BeOS/ar-fake Modules/ar_beos
$ mv BeOS/linkmodule Modules/ld_so_beos;  you may also $ mv BeOS/README
Misc/BeOS-NOTES and delete the rest of BeOS if you like.  This patch doesn't
modify either of those files, but another patch will.  The new top level Makefile is
a good thing here, by the way.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 18:55

Message:
Logged In: YES 
user_id=21627

Bug python.org/sf/533306 claims that there is no cc_r on
AIX, yet your patch changes the compiler name. Can you
please explain?

----------------------------------------------------------------------

Comment By: Donn Cave (donnc)
Date: 2001-02-15 19:24

Message:
An alert reader brought to my attention, I appear to have
converted to CRLF line endings when I pasted this patch in,
so it applies better after a tr -d '\015' or similar.  Will
be happy to resubmit better copies of this and the other
couple of patches I botched the same way, if it's a problem
at all.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 17:57:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 09:57:02 -0800
Subject: [Patches] [ python-Patches-403679 ] AIX and BeOS build quirk revisions
Message-ID: <E16oTHi-0007Bv-00@usw-sf-web4.sourceforge.net>

Patches item #403679, was opened at 2001-02-08 08:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470

Category: Build
Group: None
Status: Closed
Resolution: None
Priority: 5
Submitted By: Donn Cave (donnc)
Assigned to: Nobody/Anonymous (nobody)
Summary: AIX and BeOS build quirk revisions

Initial Comment:
This obsoletes #103487.  It deals with scripts like ld_so_aix.  Please move
the scripts in the BeOS subdirectory to Modules:  $ mv BeOS/ar-fake Modules/ar_beos
$ mv BeOS/linkmodule Modules/ld_so_beos;  you may also $ mv BeOS/README
Misc/BeOS-NOTES and delete the rest of BeOS if you like.  This patch doesn't
modify either of those files, but another patch will.  The new top level Makefile is
a good thing here, by the way.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 18:57

Message:
Logged In: YES 
user_id=21627

... got the number wrong; it is 533188.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 18:55

Message:
Logged In: YES 
user_id=21627

Bug python.org/sf/533306 claims that there is no cc_r on
AIX, yet your patch changes the compiler name. Can you
please explain?

----------------------------------------------------------------------

Comment By: Donn Cave (donnc)
Date: 2001-02-15 19:24

Message:
An alert reader brought to my attention, I appear to have
converted to CRLF line endings when I pasted this patch in,
so it applies better after a tr -d '\015' or similar.  Will
be happy to resubmit better copies of this and the other
couple of patches I botched the same way, if it's a problem
at all.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 18:04:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 10:04:26 -0800
Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8
Message-ID: <E16oTOs-0006qA-00@usw-sf-web2.sourceforge.net>

Patches item #533165, was opened at 2002-03-21 17:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470

Category: Tests
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: add expected test failures on solaris 8

Initial Comment:
This patch makes the following skipped tests expected
on sunos5:

test_al test_bsddb test_cd test_cl test_gl 
test_imgfile test_linuxaudiodev test_nis
test_openpty test_winreg test_winsound

I'll try to fix the problem that sunos5 should really
be something like sunos5.6, 5.7, 5.8, etc.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 19:04

Message:
Logged In: YES 
user_id=21627

This is indeed what I'd prefer to happen - but you probably
need BDFL support before changing it. Closing it for now -
if alternatives are rejected, we can reopen it.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-22 18:48

Message:
Logged In: YES 
user_id=33168

I agree that this the skipped test is inadequate, but
this is more of a general problem.  I actually used linux2
as the template.  But only applied the tests which really
failed on the sun.  linux2 also adds that curses,
socket_ssl, socketserver to the list, even those these are
probably successful with the -u curses -u network flags.

I could certainly pare down the list to not include nis, bsddb.
Are you saying that you would like new code added to
regrtest.py to handle TestUnavailable or something like
that?  So if NIS is not available it would raise this
exception.  This is probably a good idea, but would mean a
bunch of tests being modified and mostly getting rid of the
current known skipped list.

If you want to head down this route, I suggest closing this
patch.  We can start a discussion on python-dev or at least
ask if anyone has a problem with the approach.

Also, we should fix the problem you noted before that sunos5
is not sufficient.  We need to be more fine grained, ie,
5.6, 5.7, 5.8.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 18:26

Message:
Logged In: YES 
user_id=21627

-1. The list of skipped modules will vary widely across
installations, even if you take Solaris versions into account.

For example, test_nis will pass for many users, since NIS is
really common in Solaris environments. Likewise, bsddb tests
will pass if bsddb is installed in /usr/local.

OTOH, test_sunaudiodev is known to fail on server systems
which don't have a /dev/audio.

Instead, I would like to see a more flexible scheme for
expected skips, which includes detection that some resources
are unavailable - if that is the cause, the skipped test
does not indicate a problem with the Python installation.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 18:42:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 10:42:18 -0800
Subject: [Patches] [ python-Patches-403679 ] AIX and BeOS build quirk revisions
Message-ID: <E16oTzW-0004er-00@usw-sf-web3.sourceforge.net>

Patches item #403679, was opened at 2001-02-08 07:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470

Category: Build
Group: None
Status: Closed
Resolution: None
Priority: 5
Submitted By: Donn Cave (donnc)
Assigned to: Nobody/Anonymous (nobody)
Summary: AIX and BeOS build quirk revisions

Initial Comment:
This obsoletes #103487.  It deals with scripts like ld_so_aix.  Please move
the scripts in the BeOS subdirectory to Modules:  $ mv BeOS/ar-fake Modules/ar_beos
$ mv BeOS/linkmodule Modules/ld_so_beos;  you may also $ mv BeOS/README
Misc/BeOS-NOTES and delete the rest of BeOS if you like.  This patch doesn't
modify either of those files, but another patch will.  The new top level Makefile is
a good thing here, by the way.

----------------------------------------------------------------------

>Comment By: Donn Cave (donnc)
Date: 2002-03-22 18:42

Message:
Logged In: YES 
user_id=42839

Response posted to 533188.  cc_r is needed for reentrant library
functions.  IBM may charge extra for reentrant library functions,
I have no idea.

  Usage:
     xlc [ option | inputfile ]...
     cc [ option | inputfile ]...
     c89 [ option | inputfile ]...
     xlc128 [ option | inputfile ]...
     cc128 [ option | inputfile ]...
     xlc_r [ option | inputfile ]...
     cc_r [ option | inputfile ]...
     xlc_r4 [ option | inputfile ]...
     cc_r4 [ option | inputfile ]...
     xlc_r7 [ option | inputfile ]...
     cc_r7 [ option | inputfile ]...

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 17:57

Message:
Logged In: YES 
user_id=21627

... got the number wrong; it is 533188.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 17:55

Message:
Logged In: YES 
user_id=21627

Bug python.org/sf/533306 claims that there is no cc_r on
AIX, yet your patch changes the compiler name. Can you
please explain?

----------------------------------------------------------------------

Comment By: Donn Cave (donnc)
Date: 2001-02-15 18:24

Message:
An alert reader brought to my attention, I appear to have
converted to CRLF line endings when I pasted this patch in,
so it applies better after a tr -d '\015' or similar.  Will
be happy to resubmit better copies of this and the other
couple of patches I botched the same way, if it's a problem
at all.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 20:03:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 12:03:56 -0800
Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin
Message-ID: <E16oVGW-0000Kj-00@usw-sf-web4.sourceforge.net>

Patches item #533681, was opened at 2002-03-22 18:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Apply semaphore code to Cygwin

Initial Comment:
The current version of Cygwin does not define 
_POSIX_SEMAPHORES by default, although requires the 
new semaphore interface since its condition variables 
interface contains a race condition.

This patch simply specifies that semaphores should be 
used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 21:03

Message:
Logged In: YES 
user_id=21627

-1. Cygwin really ought to define _POSIX_SEMAPHORES if they
support them, so if they support them and don't define the
feature test macro, it is a Cygwin bug. Work-arounds around
platform bugs are generally discourgaged in Python.

On python-dev, you indicate that _POSIX_SEMPAPHORES is only
defined if __rtems__ is also defined. What is the rationale
for that?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 21:18:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 13:18:41 -0800
Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin
Message-ID: <E16oWQr-0006VK-00@usw-sf-web3.sourceforge.net>

Patches item #533681, was opened at 2002-03-22 17:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Apply semaphore code to Cygwin

Initial Comment:
The current version of Cygwin does not define 
_POSIX_SEMAPHORES by default, although requires the 
new semaphore interface since its condition variables 
interface contains a race condition.

This patch simply specifies that semaphores should be 
used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined.


----------------------------------------------------------------------

>Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 21:18

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 20:03

Message:
Logged In: YES 
user_id=21627

-1. Cygwin really ought to define _POSIX_SEMAPHORES if they
support them, so if they support them and don't define the
feature test macro, it is a Cygwin bug. Work-arounds around
platform bugs are generally discourgaged in Python.

On python-dev, you indicate that _POSIX_SEMPAPHORES is only
defined if __rtems__ is also defined. What is the rationale
for that?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 21:19:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 13:19:08 -0800
Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin
Message-ID: <E16oWRI-0006Vq-00@usw-sf-web3.sourceforge.net>

Patches item #533681, was opened at 2002-03-22 17:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Apply semaphore code to Cygwin

Initial Comment:
The current version of Cygwin does not define 
_POSIX_SEMAPHORES by default, although requires the 
new semaphore interface since its condition variables 
interface contains a race condition.

This patch simply specifies that semaphores should be 
used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined.


----------------------------------------------------------------------

>Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 21:19

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 21:18

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 20:03

Message:
Logged In: YES 
user_id=21627

-1. Cygwin really ought to define _POSIX_SEMAPHORES if they
support them, so if they support them and don't define the
feature test macro, it is a Cygwin bug. Work-arounds around
platform bugs are generally discourgaged in Python.

On python-dev, you indicate that _POSIX_SEMPAPHORES is only
defined if __rtems__ is also defined. What is the rationale
for that?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 21:28:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 13:28:20 -0800
Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin
Message-ID: <E16oWaC-0001JD-00@usw-sf-web4.sourceforge.net>

Patches item #533681, was opened at 2002-03-22 12:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Apply semaphore code to Cygwin

Initial Comment:
The current version of Cygwin does not define 
_POSIX_SEMAPHORES by default, although requires the 
new semaphore interface since its condition variables 
interface contains a race condition.

This patch simply specifies that semaphores should be 
used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-22 16:28

Message:
Logged In: YES 
user_id=31435

I'm afraid I agree with Martin here:  the crusty old 
historical examples you dug up are exactly why we avoid 
doing similar stuff now.  Nobody understands why that code 
is there anymore, and it will never go away.  For example, 
I happen to know that KSR went bankrupt in 1994, and 
anything keying off __ksr__ has been worse than useless 
since then.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 16:19

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 16:18

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 15:03

Message:
Logged In: YES 
user_id=21627

-1. Cygwin really ought to define _POSIX_SEMAPHORES if they
support them, so if they support them and don't define the
feature test macro, it is a Cygwin bug. Work-arounds around
platform bugs are generally discourgaged in Python.

On python-dev, you indicate that _POSIX_SEMPAPHORES is only
defined if __rtems__ is also defined. What is the rationale
for that?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 23:20:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 15:20:33 -0800
Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks
Message-ID: <E16oYKn-0002dx-00@usw-sf-web4.sourceforge.net>

Patches item #533621, was opened at 2002-03-22 15:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Neil Schemenauer (nascheme)
Summary: Remove pymalloc hooks

Initial Comment:
Just to make sure Vladimir hates me. :-)

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 23:20

Message:
Logged In: YES 
user_id=35752

Checked in as obmalloc.c 2.5.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 17:40

Message:
Logged In: YES 
user_id=31435

Well, I hate you too, but it's still a good idea <wink>.  
Accepted & back to you.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470


From noreply@sourceforge.net  Fri Mar 22 23:20:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 15:20:58 -0800
Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks
Message-ID: <E16oYLC-0002e8-00@usw-sf-web4.sourceforge.net>

Patches item #533621, was opened at 2002-03-22 15:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Neil Schemenauer (nascheme)
Summary: Remove pymalloc hooks

Initial Comment:
Just to make sure Vladimir hates me. :-)

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-22 23:20

Message:
Logged In: YES 
user_id=35752

Checked in as obmalloc.c 2.5.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 17:40

Message:
Logged In: YES 
user_id=31435

Well, I hate you too, but it's still a good idea <wink>.  
Accepted & back to you.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 03:41:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 19:41:05 -0800
Subject: [Patches] [ python-Patches-440407 ] Remote execution patch for IDLE
Message-ID: <E16ocOv-0005E5-00@usw-sf-web4.sourceforge.net>

Patches item #440407, was opened at 2001-07-11 09:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470

Category: IDLE
Group: None
Status: Open
Resolution: Out of Date
>Priority: 1
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Remote execution patch for IDLE

Initial Comment:
This is the code I have for the remote execution
patch.  (Remote execution must be enabled with an
explicit command line argument -r.)

Caveats:

- undocumented
- slow
- security issue: the subprocess should not be the
server but the client, to prevent a hacker from gaining
access

This should apply cleanly against IDLE as currently
checked into the Python CVS tree.

I don't want to check this in yet because of the
security issue, and I don't have time to work on it. I
hope the idlefork project will pick this up though and
address the issues above.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-09 09:19

Message:
Logged In: YES 
user_id=6380

No, the IDLEfork project has stalled except for tweaking the
configuration code (which would be good to merge into the
Python IDLE  tree when it's ready).  I expect the patch
failure is shallow so I won't bother fixing it.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 06:02

Message:
Logged In: YES 
user_id=21627

It appears the patch is slightly outdated now, atleast the
chunk removing set_break does not apply anymore.

Has this been integrated to idlefork?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-07-11 09:38

Message:
Logged In: YES 
user_id=6380

Uploading the patch again.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 03:47:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 22 Mar 2002 19:47:16 -0800
Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior
Message-ID: <E16ocUu-0001o1-00@usw-sf-web1.sourceforge.net>

Patches item #514662, was opened at 2002-02-07 23:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
>Priority: 6
Submitted By: Naofumi Honda (naofumi-h)
Assigned to: Guido van Rossum (gvanrossum)
Summary: On the update_slot() behavior

Initial Comment:
Inherited method __getitem__ of list type
in the new subclass is unexpectedly slow. 

For example,

x = list([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m2.390s 

class nlist(list):
        pass

x = nlist([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m7.040s
about 3times slower!!!

The reason is:
for the __getitem__ attribute, there are
two slotdefs in typeobject.c
(one for the mapping type, and
the other for the sequence type).

In the creation of new_type of list type, 
fixup_slot_dispatchers() and update_slot() functions
in typeobject.c allocate the functions
to both sq_item and mp_subscript slots
(the mp_subscript slot had originally no function,
  because the list type is a sequence type),
 and it's an unexpected allocation for the mapping
 slot since the descriptor type of __getitem__
 is now WrapperType for the sequence operations.

If you will trace x[1] using gdb,
you will find that in PyObject_GetItem() 
m->mp_subscript = slot_mp_subscript 
is called instead of a sequece operation
because mp_subscript slot was allocated by
fixup_slot_dispatchers().
In the slot_mp_subscirpt(),
call_method(self, "__getitem__", ...) is invoked,
and turn out to call a wrapper descriptors for
the sq_item.

As a result, the method of list type finally called,
but it needs many unexpected function calls.

I will fix the behavior of fixup_slot_dispachers()
and update_slot() as follows:

Only the case where 
*) two or more slotdefs have the same attribute
   name where at most one corresponding slot
   has a non null pointer
*) the descriptor type of the attribute is
   WrapperType,

these functions will allocate the only one
function to the apropriate slot.

The other case, the behavior not changed
to keep compatiblity!
(in particular, considering the case where
  user override methods exist!)

The following patch also includes speed up routines
to find the slotdef duplications,
but it's not essential!


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-22 22:47

Message:
Logged In: YES 
user_id=6380

Is slot-1.dif the promised new patch?

----------------------------------------------------------------------

Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-11 21:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-11 21:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-10 17:14

Message:
Logged In: YES 
user_id=6380

Thanks for the analysis! Would you mind submitting a new
patch without the #ifdef ORIGINAL_CODE stuff? Just
delete/replace old code as needed -- cvs diff will show me
the original code. The ORIGINAL_CODE stuff makes it harder
for me to get the point of the diff. Also, maybe you could
leave the speedup code out, to show the absolutely minimal
amount of code needed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 08:40:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 00:40:39 -0800
Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior
Message-ID: <E16oh4p-0007lZ-00@usw-sf-web2.sourceforge.net>

Patches item #514662, was opened at 2002-02-08 04:49
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 6
Submitted By: Naofumi Honda (naofumi-h)
Assigned to: Guido van Rossum (gvanrossum)
Summary: On the update_slot() behavior

Initial Comment:
Inherited method __getitem__ of list type
in the new subclass is unexpectedly slow. 

For example,

x = list([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m2.390s 

class nlist(list):
        pass

x = nlist([1,2,3])
r = xrange(1, 1000000)
for i in r:
        x[1] = 2

==> excution time: real    0m7.040s
about 3times slower!!!

The reason is:
for the __getitem__ attribute, there are
two slotdefs in typeobject.c
(one for the mapping type, and
the other for the sequence type).

In the creation of new_type of list type, 
fixup_slot_dispatchers() and update_slot() functions
in typeobject.c allocate the functions
to both sq_item and mp_subscript slots
(the mp_subscript slot had originally no function,
  because the list type is a sequence type),
 and it's an unexpected allocation for the mapping
 slot since the descriptor type of __getitem__
 is now WrapperType for the sequence operations.

If you will trace x[1] using gdb,
you will find that in PyObject_GetItem() 
m->mp_subscript = slot_mp_subscript 
is called instead of a sequece operation
because mp_subscript slot was allocated by
fixup_slot_dispatchers().
In the slot_mp_subscirpt(),
call_method(self, "__getitem__", ...) is invoked,
and turn out to call a wrapper descriptors for
the sq_item.

As a result, the method of list type finally called,
but it needs many unexpected function calls.

I will fix the behavior of fixup_slot_dispachers()
and update_slot() as follows:

Only the case where 
*) two or more slotdefs have the same attribute
   name where at most one corresponding slot
   has a non null pointer
*) the descriptor type of the attribute is
   WrapperType,

these functions will allocate the only one
function to the apropriate slot.

The other case, the behavior not changed
to keep compatiblity!
(in particular, considering the case where
  user override methods exist!)

The following patch also includes speed up routines
to find the slotdef duplications,
but it's not essential!


----------------------------------------------------------------------

>Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-23 08:40

Message:
Logged In: YES 
user_id=452575

Yes. slot-1.dif is a new version.
At least, I purged ifdef ... as you want.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-23 03:47

Message:
Logged In: YES 
user_id=6380

Is slot-1.dif the promised new patch?

----------------------------------------------------------------------

Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-12 02:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Naofumi Honda (naofumi-h)
Date: 2002-03-12 02:49

Message:
Logged In: YES 
user_id=452575

I will post a new patch containing a essential part of
previous one (i.e. without ifdef and almost all speed up
routines).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-10 22:14

Message:
Logged In: YES 
user_id=6380

Thanks for the analysis! Would you mind submitting a new
patch without the #ifdef ORIGINAL_CODE stuff? Just
delete/replace old code as needed -- cvs diff will show me
the original code. The ORIGINAL_CODE stuff makes it harder
for me to get the point of the diff. Also, maybe you could
leave the speedup code out, to show the absolutely minimal
amount of code needed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 22:41:56 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 14:41:56 -0800
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E16ouCy-0005S8-00@usw-sf-web3.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 23:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Nobody/Anonymous (nobody)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 22:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 23:35:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 15:35:30 -0800
Subject: [Patches] [ python-Patches-479615 ] Fast-path for interned string compares
Message-ID: <E16ov2o-0005oK-00@usw-sf-web1.sourceforge.net>

Patches item #479615, was opened at 2001-11-08 15:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: M.-A. Lemburg (lemburg)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Fast-path for interned string compares

Initial Comment:
This patch adds a fast-path for comparing equality of interned strings. 

The patch boosts performance for comparing identical string objects 
by some 20% on my machine while not causing any noticable slow-down 
for other operations (according to tests done with pybench).

More infos and benchmarks later...


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 23:35

Message:
Logged In: YES 
user_id=35752

Attached is an updated version of this patch.  I'm -0 on it
since it doesn't seem to help much except for artificial
benchmarks.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-11-08 15:26

Message:
Logged In: YES 
user_id=38388

Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython):

PYBENCH 1.0

Benchmark: eqpython.bench (rounds=10, warp=20)

Tests:                              per run    per oper.    diff *)
------------------------------------------------------------------------
          BuiltinFunctionCalls:     125.55 ms    0.98 us    -1.68%
           BuiltinMethodLookup:     180.10 ms    0.34 us    +1.75%
                 CompareFloats:     107.30 ms    0.24 us    +2.04%
         CompareFloatsIntegers:     185.15 ms    0.41 us    -0.05%
               CompareIntegers:     163.50 ms    0.18 us    -1.77%

        CompareInternedStrings:      79.50 ms    0.16 us   -20.78%
^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^

                  CompareLongs:     110.25 ms    0.24 us    +0.09%
                CompareStrings:     143.40 ms    0.29 us    +2.14%
                CompareUnicode:     118.00 ms    0.31 us    +1.68%
                 ConcatStrings:     189.55 ms    1.26 us    -1.61%
                 ConcatUnicode:     226.55 ms    1.51 us    +1.34%
               CreateInstances:     202.35 ms    4.82 us    -1.87%
       CreateStringsWithConcat:     221.00 ms    1.11 us    +0.45%
       CreateUnicodeWithConcat:     240.00 ms    1.20 us    +1.27%
                  DictCreation:     213.25 ms    1.42 us    +0.47%
             DictWithFloatKeys:     263.50 ms    0.44 us    +1.15%
           DictWithIntegerKeys:     158.50 ms    0.26 us    -1.86%
            DictWithStringKeys:     147.60 ms    0.25 us    +0.75%
                      ForLoops:     144.90 ms   14.49 us    -4.64%
                    IfThenElse:     174.15 ms    0.26 us    -0.00%
                   ListSlicing:      88.80 ms   25.37 us    -1.11%
                NestedForLoops:     136.95 ms    0.39 us    +3.01%
          NormalClassAttribute:     177.80 ms    0.30 us    -2.68%
       NormalInstanceAttribute:     166.85 ms    0.28 us    -0.54%
           PythonFunctionCalls:     152.20 ms    0.92 us    +1.40%
             PythonMethodCalls:     133.70 ms    1.78 us    +1.60%
                     Recursion:     119.45 ms    9.56 us    +0.04%
                  SecondImport:     124.65 ms    4.99 us    -6.03%
           SecondPackageImport:     130.70 ms    5.23 us    -5.73%
         SecondSubmoduleImport:     161.65 ms    6.47 us    -5.88%
       SimpleComplexArithmetic:     245.50 ms    1.12 us    +2.08%
        SimpleDictManipulation:     108.50 ms    0.36 us    +0.05%
         SimpleFloatArithmetic:     125.80 ms    0.23 us    +0.84%
      SimpleIntFloatArithmetic:     128.50 ms    0.19 us    -1.46%
       SimpleIntegerArithmetic:     128.45 ms    0.19 us    -0.77%
        SimpleListManipulation:     159.15 ms    0.59 us    -5.32%
          SimpleLongArithmetic:     189.55 ms    1.15 us    +2.65%
                    SmallLists:     293.70 ms    1.15 us    -5.26%
                   SmallTuples:     230.00 ms    0.96 us    +0.44%
         SpecialClassAttribute:     175.70 ms    0.29 us    -2.79%
      SpecialInstanceAttribute:     199.70 ms    0.33 us    -1.55%
                StringMappings:     196.85 ms    1.56 us    -2.48%
              StringPredicates:     133.00 ms    0.48 us    -8.28%
                 StringSlicing:     165.45 ms    0.95 us    -3.47%
                     TryExcept:     193.60 ms    0.13 us    +0.57%
                TryRaiseExcept:     175.40 ms   11.69 us    +0.69%
                  TupleSlicing:     156.85 ms    1.49 us    -0.00%
               UnicodeMappings:     175.90 ms    9.77 us    +1.76%
             UnicodePredicates:     141.35 ms    0.63 us    +0.78%
             UnicodeProperties:     184.35 ms    0.92 us    -2.10%
                UnicodeSlicing:     179.45 ms    1.03 us    -1.10%
------------------------------------------------------------------------
            Average round time:    9855.00 ms               -1.13%

*) measured against: stdpython.bench (rounds=10, warp=20)

As you can see, the rest of the results don't change much and the
ones that do indicate some additional benefit gained by the patch.
All slow-downs are way below the noise limit of around 5-10% (depending
the platforms/machine/compiler).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 23:45:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 15:45:41 -0800
Subject: [Patches] [ python-Patches-490026 ] Namespace selection for rlcompleter
Message-ID: <E16ovCf-0000W8-00@usw-sf-web2.sourceforge.net>

Patches item #490026, was opened at 2001-12-06 21:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490026&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Fernando Pérez (fer_perez)
Assigned to: Nobody/Anonymous (nobody)
Summary: Namespace selection for rlcompleter

Initial Comment:
The standard rlcompleter is hardwired to work with 
__main__.__dict__. This is limiting, as one may have 
applications which execute in specially constructed 
'sandboxed' namespaces.

This patch extends rlcompleter with 
a constructor which provides an optional namespace specifier. 
This optional parameter defaults to __main__.__dict__, so the 
patch is 100% backwards compatible.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 23:45

Message:
Logged In: YES 
user_id=35752

Looks good.  Checked in with minor modifications as
rlcompleter.py 1.10.

----------------------------------------------------------------------

Comment By: Fernando Pérez (fer_perez)
Date: 2001-12-11 18:44

Message:
Logged In: YES 
user_id=395388

Updated with a one-line fix (a mistyped variable name). Deleted v2 of the 
patch with the typo.

----------------------------------------------------------------------

Comment By: Fernando Pérez (fer_perez)
Date: 2001-12-09 07:16

Message:
Logged In: YES 
user_id=395388

I've uploaded a new version of the patch with those changes.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-09 03:32

Message:
Logged In: YES 
user_id=6380

Yes, that's about right.

----------------------------------------------------------------------

Comment By: Fernando Pérez (fer_perez)
Date: 2001-12-09 02:53

Message:
Logged In: YES 
user_id=395388

I could rewrite it to use instead a namespace=None in the constructor. 
If a namespace is given it will be used, otherwise at completion time a 
check will be made:
if self.namespace is None:
  
self.namespace=__main__.__dict__
This means an extra if in the 
completer, but would address your concern. Do you want me to do that?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-09 01:38

Message:
Logged In: YES 
user_id=6380

Since this is obviously a new feature, I'll postpone this
until after 2.2.

One thing that worries me: you capture the identity of
__main__.__dict__ early on in this patch.  The original code
uses whatever __main__.__dict__ at the time it is needed.

----------------------------------------------------------------------

Comment By: Fernando Pérez (fer_perez)
Date: 2001-12-08 18:39

Message:
Logged In: YES 
user_id=395388

Oops, sorry. You can tell I've never used the system before. I put the 
file in, but I just didn't see the stupid extra checkbox. Lack of 
orthogonality 
in an interface is always a recipe for 
problems.

Anyway, it should be ok now.

Cheers,

Fernando.

PS. 
And the obvious, *THANKS* a lot for putting such a fantastic tool out.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-08 17:15

Message:
Logged In: YES 
user_id=6380

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490026&group_id=5470


From noreply@sourceforge.net  Sat Mar 23 23:51:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 15:51:27 -0800
Subject: [Patches] [ python-Patches-490374 ] make inspect.stack() work with PyShell
Message-ID: <E16ovIF-0000aI-00@usw-sf-web2.sourceforge.net>

Patches item #490374, was opened at 2001-12-07 19:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490374&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Jason Orendorff (jorend)
Assigned to: Nobody/Anonymous (nobody)
Summary: make inspect.stack() work with PyShell

Initial Comment:
I'm on Python 2.2b2 on Windows.

Changed the 'inspect' module to use 'linecache' for 
loading source code.  This is more efficient.  
Also, 'inspect' now can see the source code of stuff 
entered in the IDLE PyShell.

E.g. In IDLE, type:
>>> import inspect
>>> inspect.stack()[0]

Without the patch, the output would be like this:
(<frame object at 0x009299D0>, None, 1, '?', None, 
None)

With this patch:
(<frame object at 0x009E9580>, '<pyshell#36>', 1, '?', 
['inspect.stack()[0]'], 0)


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 23:51

Message:
Logged In: YES 
user_id=35752

Checked in as inspect.py 1.29.

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2002-02-20 14:51

Message:
Logged In: YES 
user_id=18139

>poke<


----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2001-12-07 20:07

Message:
Logged In: YES 
user_id=18139

I'm afraid it's definitely a feature.  (sigh)


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-07 20:04

Message:
Logged In: YES 
user_id=6380

Assigned to Tim for review, since he knows inspect.py
inside-out. :-)

It's probably too late for 2.2, unless you can prove this is
a bugfix and not a feature.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490374&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 00:02:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 16:02:58 -0800
Subject: [Patches] [ python-Patches-491936 ] Opt for tok_nextc
Message-ID: <E16ovTO-00019v-00@usw-sf-web4.sourceforge.net>

Patches item #491936, was opened at 2001-12-12 08:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: David Jacobs (dbj)
Assigned to: Nobody/Anonymous (nobody)
Summary: Opt for tok_nextc

Initial Comment:
tokenizer.c - revision 2.53

I tried to pick a routine that looked like it was 
heavily used and optimizations that do not increase 
the maintenance burden (I wont feel bad if you reject 
it though, I'll keep on trying as long as you don't 
consider it a burden :-).

I changed one strcpy to a memcpy because the length 
had already been computed.

I also changed the pattern:
a = strchr(b,'\0');
to
a = b + strlen(b);

Which is an idiom I've seen in many other places in 
the code so I don't think it makes it harder to 
understand and strlen is significantly more efficient 
than strchr.

Aloha,
David Jacobs (your pico optimizer :-)

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 00:02

Message:
Logged In: YES 
user_id=35752

It _seems_ to give about a 2% speedup when running
compileall.py on Lib.  That's in the noise.  I'm rejecting
this patch.  It's just not worth it.

David, don't let this discourage you.  Optimizing Python
is hard since all the low hanging fruit has been picked by
other people.  I think replacing strncpy with strlcpy might
yield better results.  Look at bug 487703.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-09 12:00

Message:
Logged In: YES 
user_id=21627

Can you report some data about the resulting speedup? I
seriously doubt that this is a significant change; unless
data is forthcoming proving me wrong, I recommend to reject
this patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 00:25:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 16:25:57 -0800
Subject: [Patches] [ python-Patches-489066 ] Include RLIM_INFINITY constant
Message-ID: <E16ovpd-0006Kk-00@usw-sf-web1.sourceforge.net>

Patches item #489066, was opened at 2001-12-04 20:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: Later
Priority: 5
Submitted By: Eric Huss (ehuss)
Assigned to: Jeremy Hylton (jhylton)
Summary: Include RLIM_INFINITY constant

Initial Comment:
The following is a patch to the resource module to 
include the RLIM_INFINITY constant.  It should handle 
platforms where RLIM_INFINITY is not a LONG_LONG, but 
I have no means to test that.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 00:25

Message:
Logged In: YES 
user_id=35752

This doesn't seem to work on my Linux machine.  RLIM_INFINITY
is an unsigned long.  It becomes -1L in the resource module.

I'm attaching an updated patch that uses PyModule_AddObject
and applies cleanly to the current CVS.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-12-13 20:43

Message:
Logged In: YES 
user_id=31392

I'd rather see this go through a beta release where we can
verify that it works for both the LONG_LONG and
non-LONG_LONG cases.

Among other things, it looks possible (though probably
unlikely) that there are platforms that do not have long
long and do not representation rlim_t as long.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-12 05:24

Message:
Logged In: YES 
user_id=6380

Jeremy, please review and apply or reject (or postpone and
lower priority).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 01:12:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 17:12:13 -0800
Subject: [Patches] [ python-Patches-494066 ] Access to readline history elements
Message-ID: <E16owYP-0006nR-00@usw-sf-web1.sourceforge.net>

Patches item #494066, was opened at 2001-12-17 04:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494066&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Chuck Blake (cblake)
Assigned to: Nobody/Anonymous (nobody)
Summary: Access to readline history elements

Initial Comment:
The current readlinemodule.c has a relatively minimal
wrapper around the functionality of the GNU readline
and history libraries.  That may be fine, and since
some try to use libeditline instead it may be the best.

However the current module does not enable any access
from within Python to the libreadline maintained list
of input lines.  The ideal thing would be to actually
export that dynamically maintained C list as a Python 
object.  In lieu of that more complex change, my patch
simply adds very simple history_get() and history_len()
methods.  This is the least one needs to access the
list.  I'm pretty sure the library functions go waaaay
back, probably to the merger of the history and
readline libraries.

This patch also adds one final little ingredient which
is rl_redisplay() in the wrapper for rl_insert_text().
Without this the user cannot see the inserted text
until they type another character, which seems pretty
undesirable.

Together these two updates allow the regular Unix
readline-enabled shell to perform "auto indentation".
I.e., inserting into the edit buffer the leading 
whitespace from the preceding non-result producing
line.  Since it can be editted, one can just backspace
a couple times to reverse the autoindent.

This makes the basic readline-enabled read-eval-print 
loop substantially more pleasant.  I can provide an
example PYTHONSTARTUP file that shows how to use it.
Only a tiny 8 line or so pre_input_hook is needed and
a slightly smart sys.ps1 or sys.ps2 object that
communicates via a variable to our hook function 
whether or not the parser is expecting more input.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:12

Message:
Logged In: YES 
user_id=35752

Checked in as readline 2.45.

I renamed the functions to get_history_item and 
get_current_history_length.  The last one's a bit 
unwieldly but hopefully clear.
get_history_length really should have been called
get_max_history_length.  Too late for that unfortunately.

I also added a redisplay function instead of adding the
redisplay call to insert_text.

----------------------------------------------------------------------

Comment By: Chuck Blake (cblake)
Date: 2001-12-17 20:41

Message:
Logged In: YES 
user_id=403855

Sounds quite reasonable.  Having a nice readline completer
and history matching interface is pretty cool when you're
using the shell over a network where remote X windows would
be painful.  It's been a very useful interface for a while,
and likely will be for the forseeable future.

When I get a chance I'll work on seeing what parts of 
readline have been around for a very long time (e.g. since
readline 2.0 or so) and try to wrap the basically available 
features more intelligently with Python objects, e.g. a
tuple or list for command input history.

Hopefully not too much will need to be conditionalized on 
readline versions.  A lot of added functionality could be
written trivially in Python if there is access to the 
library structures and exporting of hook/event type functions.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-17 20:13

Message:
Logged In: YES 
user_id=6380

OK, in the sake of stability, let's not do any of this in
2.2 then. Sounds like there are plenty of things we *could*
do. I'm not against expanding the readline module -- but I
don't have a use for it in mind myself. For fancy editing I
much prefer IDLE's command line editor, since it lets you
edit an entire multi-line command as a single unit, than on
a per-line basis as readline does...

----------------------------------------------------------------------

Comment By: Chuck Blake (cblake)
Date: 2001-12-17 20:00

Message:
Logged In: YES 
user_id=403855

I have something this in my ~/.py/rc.py (STARTUP file).
The just_did_a_result var is also maintained by sys.ps1.

    def auto_indent():
        global just_did_a_result
        if just_did_a_result:
            just_did_a_result = 0
            return
        last = readline.history_get(readline.history_len())
        spc = len(last) - len(last.lstrip())
        if spc > 0:
            readline.insert_text(last[ : spc])
    readline.set_pre_input_hook(auto_indent)

I don't know if you have a system where set_pre_input_hook
is available.  Unless you have access to the history or at
least the very last input line from within Python, then it
doesn't seem very useful.  That is because there is no way
for your input_hook to know when/what it should stuff text
into your command buffer.

The redisplay() is innocuous when it happens to be
unnecessary, so it shouldn't be very objecionable.
It's an interactive prompt so hyper-optimization isn't
very important or noticeable.  Even on a slow terminal
it is only a few characters in one command prompt being
re-drawn.

If it is really an issue, though, then an alternative to
adding my redisplay() fix would be to export another
function from readline to Python, namely rl_redisplay().
Anyone's Python code could then just call it as necessary.

Longer term, it seems like an awful lot more libhistory and
libreadline functionality could profitably be included in 
the readline module.  That's surely a 2.3 or later change,
but the exporting of rl_redisplay() might be a closer step
in that direction.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-17 19:40

Message:
Logged In: YES 
user_id=6380

Hm, I was going to see if the insert_text fix was a simple
enough fix to apply to 2.2, but I don't have an example of
where this is needed. If I call it from the startup hook the
text I insert is already being displayed.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494066&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 01:27:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 17:27:32 -0800
Subject: [Patches] [ python-Patches-494871 ] test exceptions in various types/methods
Message-ID: <E16ownE-0001YO-00@usw-sf-web2.sourceforge.net>

Patches item #494871, was opened at 2001-12-19 02:16
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494871&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: test exceptions in various types/methods

Initial Comment:
Add a bunch of tests for various methods, including
numeric stuff like:

float('')
float('5\0')
5.0 / 0.0
5.0 // 0.0
5.0 % 0.0
5 << -5

sequence stuff like:
()[0]
x += ()
[].pop()
[].extend(None)

{}.values()
{}.items()

not sure if buffer stuff should go here.
if so, need to update X.X.X to be a real number,
not sure if there is any correlation of the numbers
or should the next available be used (6.7)


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:27

Message:
Logged In: YES 
user_id=35752

Checked in as test_types.py 1.26.  I left out the section
number for "Buffers".  Having section numbers in the testing
output seems insane to me.  What if a section is added to
the documentation?


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2001-12-28 23:27

Message:
Logged In: YES 
user_id=33168

I didn't see buffers mentioned in section 2.2 at all.  
The buffer() function is mentioned in 2.1.

Perhaps the buffer tests should be moved into a test
of their own?  There appear to be very few uses
of buffer throughout the tests.  Also, I saw in
test_StringIO.py that jython doesn't have buffers,
so the whole test should be skipped/pass for jython
it seems (see lines 79-80).

Other than the buffer change in the patch, the other tests
should be in the appropriate location.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-28 22:19

Message:
Logged In: YES 
user_id=21627

The numbers are the section numbers of the documentation, of
what is now section 2.2 (dunno in what release and document
this was section 6). I also don't know how useful it is to
keep the numbering, however, if you easily can, please
re-organize your tests to fit into the most appropriate
sections. Optionally, you
a) may want to check that the things you are testing are
really mentioned in the section, and
b) may want to update the tests to the current section numbers.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494871&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 01:39:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 17:39:35 -0800
Subject: [Patches] [ python-Patches-497097 ] location of mbox
Message-ID: <E16owyt-00074V-00@usw-sf-web1.sourceforge.net>

Patches item #497097, was opened at 2001-12-27 18:14
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497097&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Nobody/Anonymous (nobody)
Summary: location of mbox

Initial Comment:
Most mail spools now are under /var, so this seems to 
be a better default.

--- python2.1-2.1.1.orig/Lib/mailbox.py
+++ python2.1-2.1.1/Lib/mailbox.py
@@ -267,7 +267,7 @@
     if mbox[:1] == '+':
         mbox = os.environ['HOME'] + '/Mail/' + mbox
[1:]
     elif not '/' in mbox:
-        mbox = '/usr/mail/' + mbox
+        mbox = '/var/mail/' + mbox
     if os.path.isdir(mbox):
         if os.path.isdir(os.path.join(mbox, 'cur')):
             mb = Maildir(mbox)


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:39

Message:
Logged In: YES 
user_id=35752

I don't know why you care since that code is inside
the _test() function.  Fixed in mailbox.py 1.35 anyhow.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497097&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 01:42:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 17:42:00 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16ox1E-00076f-00@usw-sf-web1.sourceforge.net>

Patches item #497736, was opened at 2001-12-30 01:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Barry Warsaw (bwarsaw)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-30 02:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 01:54:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 17:54:36 -0800
Subject: [Patches] [ python-Patches-501713 ] compileall.py -d errors
Message-ID: <E16oxDQ-0007SC-00@usw-sf-web3.sourceforge.net>

Patches item #501713, was opened at 2002-01-10 10:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Bastian Kleineidam (calvin)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: compileall.py -d errors

Initial Comment:
the option -d is not handled properly, the
compileall.py script generates files in the
wrong directory.

Patch is for Python 2.1.1.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:54

Message:
Logged In: YES 
user_id=35752

Attached is an updated version of the patch that cleanly
applies to the current CVS tree.  I can't figure out what
the -d option is supposed to do however.  The documentation
says "-d destdir: purported directory name for error messages
if no directory arguments, -l sys.path is assumed".  What
does that mean?  Assigning to Guido since it looks like he
added the -d option.

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-01-17 16:49

Message:
Logged In: YES 
user_id=9205

I updated the patch to correct the case where dfile is None

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 01:57:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 17:57:18 -0800
Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups
Message-ID: <E16oxG2-0007UN-00@usw-sf-web3.sourceforge.net>

Patches item #502415, was opened at 2002-01-11 18:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Zooko O'Whielacronx (zooko)
Assigned to: Nobody/Anonymous (nobody)
Summary: optimize attribute lookups

Initial Comment:
This patch optimizes the string comparisons in
class_getattr(), class_setattr(), instance_getattr1(),
and instance_setattr().

I pulled out the relevant section of class_setattr()
and measured its performance, yielding the following
results:

 * in the case that the argument does *not* begin with
"__", then the new version is 1.03 times as fast as the
old.  (This is a mystery to me, as the path through the
code looks the same, in C.  I examined the assembly
that GCC v3.0.3 generated in -O3 mode, and it is true
that the assembly for the new version is
smaller/faster, although I don't really understand why.)

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "X_" (where X is a random alphabetic
character), then the new version 1.12 times as fast as
the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and does *not* end with "_", then the new version
1.16 times as fast as the old.

 * in the case that the argument is (randomly) one of
the six special names, then the new version is 2.7
times as fast as the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "__" (but is not one of the six
special names), then the new version is 3.7 times as
fast as the old.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:57

Message:
Logged In: YES 
user_id=35752

Based on the complexity added by the patch I would say
at least a 5% speedup would be needed to offset the
maintainence cost.  -1 on the current patch.

----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-14 16:24

Message:
Logged In: YES 
user_id=52562

update:

I did a real app benchmark of this patch by running one of
the unit tests from 
PyXML-0.6.6.  (Which one?  The one that I guessed would
favor my optimization 
the most.  Unfortunately I've lost my notes and I don't
remember which one.)

I also separated out the "unroll strcmp" optimization from
the "use macros" 
optimization on request.

I have lost my notes, but I recall that my results showed
what I expected: 
between 0.5 and 3 percent app-level speed-up for the unroll
strcmp optimization.

Interesting detail: a quirk in GCC 3 makes the unroll strcmp
version is slightly 
faster than the current strcmp version *even* in the
(common) case that the 
first two characters of the attribute name are *not* '__'.

What should happen next:

1.  Someone who has the authority to approve or reject this
patch should tell me 
what kind of benchmark would be persuasive to you.  I mean:
what specific 
program I can run with and without my patch for a useful
comparison.  (If you 
require more than a 5% app-level speed-up, then let's give
up on this patch now!)

2.  Someone volunteer to test this patch with MSFT compiler,
as I don't have one 
right now.  Some people are still using the Windows
platform, I've noticed [1], 
so it is worth benchmarking.  Actually, someone should
volunteer to benchmark 
GCC+Linux-or-MacOSX, too, as my computer is a laptop with
variable-speed CPU and 
is really crummy for benchmarking.

By the way, PEP 266 is a better solution to the problem but
until it's 
implemented, this patch is the better patch.  ;-)

Note: this is one of those patches that looks uglier in
"diff -u" format than in 
actual source code.  Please browse the actual source
side-by-side [2] to see how 
ugly it really is.

Regards

Zooko

[1] http://www.google.com/press/zeitgeist/jan02-pie.gif
[2] search for "class_getattr" in:
    http://zooko.com/classobject.c
    http://zooko.com/classobject-strcmpunroll.c

---
                 zooko.com
Security and Distributed Systems Engineering
---


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-18 00:22

Message:
Logged In: YES 
user_id=52562

Okay I've done some "mini benchmarks".  The earlier reported
micro-benchmarks were the result of running the inner loop
itself, in C.  These mini benchmarks are the result of
running this Python script:

class A:
    def __init__(self):
        self.a = 0

a = A()
for i in xrange(2**20):
    a.a = i

print a.a

and then using different attribute names in place of `a'.
The results are as expected: the optimized version is faster
than the current one, depending on the shape of the
attribute name, and dampened by the fact that there is now
other work being done.  The case that shows the smallest
difference is when the attribute name neither begins nor
ends with an '_'.  In that case the above script runs about
2% faster with the optimizations.  The case that shows the
biggest difference is when the attribute begins and ends
with '__', as in `__a__'.  Then the above script runs about
15% faster.

This still isn't a *real* application benchmark.  I'm
looking for one that is a reasonable case for real Python
users but that also uses attribute lookups heavily.


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-17 20:33

Message:
Logged In: YES 
user_id=52562

Yeah, the optimized version is less readable that the original.

I'll try to come up with a benchmark application.  Any
ideas?  Maybe some unit tests from Zope that use attribute
lookups heavily?

My guess is that the actual results in an application will
be "marginal", like maybe between 0.5% to 3% improvement.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:29

Message:
Logged In: YES 
user_id=31392

This seems to add a lot of complexity for a few special
cases.  How important are these particular attributes?  Do
you have any benchmark applications that show real
improvement?  It seems like microbenchmarks overstate the
benefit, since we don't know how often these attributes are
looked up by most applications.

It would also be interesting to see how much of the benefit
for non __ names is the result of the PyString_AS_STRING()
macro.  Maybe that's all the change we really need :-).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 02:01:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 18:01:10 -0800
Subject: [Patches] [ python-Patches-504889 ] make setup.py less chatty by default
Message-ID: <E16oxJm-0007X4-00@usw-sf-web3.sourceforge.net>

Patches item #504889, was opened at 2002-01-17 15:02
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504889&group_id=5470

Category: Distutils and setup.py
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeremy Hylton (jhylton)
Assigned to: Nobody/Anonymous (nobody)
Summary: make setup.py less chatty by default

Initial Comment:
I don't like the amount of output that setup.py
produces by default, and I don't like the way that -q
and -v affect the amount of output.  In general, I want
setup.py to tell me what it is doing and not what it is
skippping.  It's fine to say nothing with -q, but it
shouldn't say more without -v.

The attached patch is a bit of a kludge, but I'm not
familiar enough with distutils to do any better.  One
problem is that -v/--verbose has previously handled as
a flag, either on or off.  (There is a curiously large
amount of code that compares this boolean to see if
it's greater than some number!)  I had the options
processor to treat self.verbose as a count of -v
options.  So -vv is more verbose than -v.

Then I change the specific prints and announcements
that I've seen with setup.py that I didn't want to see.
 The messages I don't want to see (unless verbose is
high) are about skipping builds of Extensions and not
copying files that are already up-to-date.

With this patch in place, setup.py tells me only the
extensions is actually builds and files it actually copies.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 02:01

Message:
Logged In: YES 
user_id=35752

I would prefer it if setup.py would only print what it's
compiling and not what it's skipping.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-18 14:53

Message:
Logged In: YES 
user_id=31392

Good suggestion.  I hadn't planned to change anything, but
wanted to capture the feature request and share the code.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-01-18 09:05

Message:
Logged In: YES 
user_id=38388

Jeremy, if that's what you want you should at least post
to the distutils list before going ahead and change things.

E.g. I can't see why "skip" notices are any less important
than "building..." notices: they tell you that distutils has
found some components up-to-date and that may sometimes
not be what you'd really expect.

We should first discuss, what distutils developers want as
default and then go ahead and fixup distutils to meet those
demands.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:25

Message:
Logged In: YES 
user_id=31392

MAL, I really want to change distutils not Python's
setup.py.  I use distutils for all sorts of projects and the
default chattiness is always a nuisance.  When I'm doing
development, I invariable have to wade through hundreds of
lines of useless output to find the one or two lines that
confirm a change was made.

You could still get the skip notices for your stuff, you'd
just have to run in extra verbose mode.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:17

Message:
Logged In: YES 
user_id=31392

If I had to guess, I'd say cleaning up and rationalizing the
use of self.verbose and print vs self.announce() vs the
other methods that print things would teach you a lot about
the internals.

Hey, and reformat the code while you're at it <wink>.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-01-17 18:17

Message:
Logged In: YES 
user_id=38388

Jeremy, the patch touches the distutils code, but what you really 
want is to change the behaviour in one single use-case (the setup.py
which Python uses).

The "right" way to fix this would be to subclass the various distutils
classes to implement the change. If this becomes too complicated,
then distutils ought to be tweaked to make this easier in way that
doesn't break existing code (e.g. I don't want to miss the skip
notices for my stuff).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-17 17:25

Message:
Logged In: YES 
user_id=6656

You're not wrong :|

The "assert 0" is on the install path though.

Right.  I'm currently fighting emacs to let me print source
duplex, but I want to understand distutils' innards at some
point, might as well be now.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 16:50

Message:
Logged In: YES 
user_id=31392


The distutils package is a maze of twisty little passages
that all look the same <wink>.  I added an assert 0 to make
sure that the execution path that generated the output
wasn't the one with the assert 0.  (It wasn't.)  Didn't
intend for the patch to make it in.

But I'd still be surprised if this patch is the right thing.
 More likely that it demonstrates good behavior that could
be implemented more cleanly.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-17 16:45

Message:
Logged In: YES 
user_id=6656

Hokay, next question: why the "assert 0" in cmd.py?  Are you
sure you've finished?

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 16:32

Message:
Logged In: YES 
user_id=31392

Er, context diff.


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-01-17 15:49

Message:
Logged In: YES 
user_id=6656

Um, context diff?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504889&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 02:04:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 18:04:54 -0800
Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs
Message-ID: <E16oxNO-0007Z2-00@usw-sf-web3.sourceforge.net>

Patches item #514997, was opened at 2002-02-08 21:22
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neil Schemenauer (nascheme)
Summary: remove extra SET_LINENOs

Initial Comment:
This patch removes consecutive SET_LINENOs.
The patch fixes test_hotspot, but does not fix
a failure in inspect.  I wasn't sure what
was the problem was or why SET_LINENO would
matter for inspect.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 22:42

Message:
Logged In: YES 
user_id=6380

Can you find someone interested in answering the inspect
question? Otherwise this patch is stalled...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 02:05:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 18:05:27 -0800
Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput
Message-ID: <E16oxNv-0007Zb-00@usw-sf-web3.sourceforge.net>

Patches item #516297, was opened at 2002-02-12 03:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
>Assigned to: Neil Schemenauer (nascheme)
Summary: iterator for lineinput

Initial Comment:
Taking the route of least evasiveness, I have come up with
a VERY simple iterator interface for fileinput.

Basically, __iter__() returns self and next() calls
__getitem__() with the proper number.  This was done to
have the patch only add methods and not change any
existing ones, thus minimizing any chance of breaking
existing code.

Now the module on the whole, however, could possibly
stand an update now that generators are coming.  I have
a recipe up at the Cookbook that uses generators to
implement fileinput w/o in-place editing
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506).
 If there is enough interest, I would be quite willing
to rewrite fileinput using generators.  And if some of
the unneeded methods could be deprecated (__getitem__,
readline), then the whole module could probably be
cleaned up a decent amount and have a possible speed
improvement.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 02:06:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 18:06:00 -0800
Subject: [Patches] [ python-Patches-522587 ] Fixes pydoc http/ftp URL matching
Message-ID: <E16oxOS-0007Zi-00@usw-sf-web3.sourceforge.net>

Patches item #522587, was opened at 2002-02-25 18:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
>Assigned to: Neil Schemenauer (nascheme)
Summary: Fixes pydoc http/ftp URL matching

Initial Comment:
The current URL matching pattern used by pydoc only 
excludes whitespace. My patch also excludes the 
following characters:

' & " - excludes the quotes in: <a href="...">
< & > - As stated in RFC-1738:

"""The characters "<" and ">" are unsafe because they 
are used as the delimiters around URLs in free text"""

We don't want to include the delimeters as part of the 
URL. And including unescaped "<" in an attribute value 
is not legal markup.

Also, remove the word boundary requirement for 
http/ftp URIs because otherwise the "/" would not be 
included in the following URL: "http://www.python.org/"

Attached is the patch and some simple test code.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 02:07:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 18:07:05 -0800
Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip)
Message-ID: <E16oxPV-0007an-00@usw-sf-web3.sourceforge.net>

Patches item #533482, was opened at 2002-03-22 08:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Todd Warner (icode)
>Assigned to: Neil Schemenauer (nascheme)
Summary: small seek tweak upon reads (gzip)

Initial Comment:
Upon actual read of a gzipped file, there is a check
to see if you are already at the end of the file. This
is done by saving your position, seeking to the end,
and comparing that tell(). It is more efficient to
simply increment position + 1.

Efficiency gain is nearly insignificant, but this
patch will greatly decrease the size of my next one. :)

NOTE: all version of gzip.py do this.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 05:29:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 23 Mar 2002 21:29:19 -0800
Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs
Message-ID: <E16p0ZD-0004Jo-00@usw-sf-web4.sourceforge.net>

Patches item #514997, was opened at 2002-02-08 16:22
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neil Schemenauer (nascheme)
Summary: remove extra SET_LINENOs

Initial Comment:
This patch removes consecutive SET_LINENOs.
The patch fixes test_hotspot, but does not fix
a failure in inspect.  I wasn't sure what
was the problem was or why SET_LINENO would
matter for inspect.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-24 00:29

Message:
Logged In: YES 
user_id=31435

Neal, do you have your editor set to insert spaces instead 
of tabs, and to consider "a tab" to be four spaces?  Guido 
wrote this file using hard tabs considered as 8-space 
gimmicks, and the after-patch code is kinda gruesome due to 
the mixture of indentation styles.

Second, why do you think a hard-coded 0xffff is something 
interesting for line numbers?  Or are you just giving up 
when line numbers are >= 2**16?  The code is mysterious 
here and needs a comment.  It's probably not good to leave 
the code in a state where adjacent SET_LINENOs are 
collapsed if and only if the line numbers "aren't big" 
(then code using line numbers can't guess whether they are 
or aren't collapsed without duplicating the same lumpy 
logic).

Third, c_lnotab is extremely delicate, historically subject 
to miserable rare bugs.  If you've read the long comment 
block explaining it near the top of this file, I'd 
appreciate an argument (in code comments more than here 
<wink>) for why just mucking with the last pair in a 
sequence of offset pairs can't break the subtle correctness 
property explained in the comment block.

Finally, it's definitely worth tracking down why 
test_inspect fails:  that test is difficult to understand, 
but the bottom line is that it's provoking an exception 
traceback and asserting that the computed line numbers 
correspond to the actual lines that are failing.  The 
failing case provokes a three-frame traceback, and 2 of the 
3 line numbers are wrong after the patch (the first is off 
by 1, and the third is off by 3; the frame in the middle 
gets the right line number).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:42

Message:
Logged In: YES 
user_id=6380

Can you find someone interested in answering the inspect
question? Otherwise this patch is stalled...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470


From gohotel@gotohotel.com.tw  Sun Mar 24 06:11:43 2002
From: gohotel@gotohotel.com.tw (¥ø·~®a¤j¶º©±(¥xÆW.¥x¤¤))
Date: Sun, 24 Mar 2002 14:11:43 +0800
Subject: [Patches] ­»´ä¶g-´_¬¡¸`¯S´f¬¡°Ê
Message-ID: <E16p1E8-0003EP-00@mail.python.org>

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=big5">
<title>home-test</title>
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta name="Microsoft Border" content="none, default">
</head>
<body bgcolor="#FFFF99">


<div align="center">
  <font color="#ff0000" size="2">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<strong><font size="5">­»´ä¶g-´_¬¡¸`¯S´f¬¡°Ê</font></strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font>
</div>


<div align="center">
  ¡@</div>
<div align="center">
  <font color="#ff0000" size="2">¸g¥Ñ¦¹ Emaill §iª¾±z¸Ô²Ó¤º®e</font></div>
<div align="center">
  ¡@</div>
<div align="center">
  <font size="2">&nbsp;<font color="#808080"><strong><font color="#ff0000"><font size="3">¹L©]½Ð¤W&nbsp;&nbsp;
  </font><a href="http://gohotel.com.tw"><font size="5">http://</font><font size="6">gohotel.com.tw</font></a><font size="3">&nbsp;&nbsp;&nbsp; 
  ºô¯¸ÂsÄý</font></font></strong></font></font></div>
<div align="center">
      <font color="#ff0000" size="2">
        <b><font color="#0000FF">¥xÆW.¥x¤¤</font></b></font></div>
<div align="center">
  <font size="2"><font color="#808080"><font color="#800080"><strong>Your best 
  choice for accommodation in Taichung.</strong></font></div>
<div align="center">
  <div align="center">
    <div align="center">
      <font color="#ff0000">¤@¦¸º¡¨¬&nbsp;&nbsp;&nbsp;&nbsp; ¦í±J.¦Y³Ü.ª±¼Ö.ÁÊª«&nbsp;&nbsp;&nbsp;&nbsp; ¤@¦¸º¡¨¬</font></div>
    <div align="center">
      ¡@</div>
    <div align="center">
      <font color="#ff0000">
      <div align="center">
        <strong><font color="#008000" size="5">¥ø·~®a¤j¶º©±&nbsp; Âù¬P¤j¶º©±</font></strong></div>
      <div align="center">
        ¡@</div>
      <div align="center">
        <font color="#008000"><strong>¦h¦¸ºaÀò¥x¤¤¥«Àu¨}®ÈÀ]µûÅ²²Ä¤@¦W</strong></font></div>
      <div align="center">
        ¡@</div>
      </font>
    </div>
      </font>
    <div align="center">
      <font color="#800000"><strong>­Y±z¤£·Q¦A¦¬¨ì¦¹¬¡°Ê</strong></font></div>
    <div align="center">
      <font color="#800000">½Ð¥Ñ&nbsp;<a href="mailto:gohotel@gotohotel.com.tw?subject=½Ð¤£¥²±HEmailµ¹§Ú-HK">gohotel@gotohotel.com.tw</a>&nbsp;§iª¾§Ú­Ì&nbsp; 
      ÁÂÁÂ</div>
    </font>
  </div>
</div>
</font>
<p>¡@</p>

</body>


From noreply@sourceforge.net  Sun Mar 24 12:02:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 04:02:48 -0800
Subject: [Patches] [ python-Patches-489066 ] Include RLIM_INFINITY constant
Message-ID: <E16p6i0-0008EG-00@usw-sf-web4.sourceforge.net>

Patches item #489066, was opened at 2001-12-04 15:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470

Category: Modules
>Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Eric Huss (ehuss)
>Assigned to: Neil Schemenauer (nascheme)
Summary: Include RLIM_INFINITY constant

Initial Comment:
The following is a patch to the resource module to 
include the RLIM_INFINITY constant.  It should handle 
platforms where RLIM_INFINITY is not a LONG_LONG, but 
I have no means to test that.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:02

Message:
Logged In: YES 
user_id=6380

Comments:

(1) RLIM_INFINITY is used unconditionally elsewhere in the
module, so the #ifdef is unnecessary.

(2) The extra #if/#endif around the closing curly is ugly.
I'd avoid this by moving the corresponding opening curly
outside the first block.

(3) resource.RLIM_INFINITY is -1 on my system too. But does
that matter? This is just a symbolic constant to be used to
set limits to infinit, and if it happens to be -1, who
cares? It's got 32 1-bits, which is what counts.

So I'd accept it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 19:25

Message:
Logged In: YES 
user_id=35752

This doesn't seem to work on my Linux machine.  RLIM_INFINITY
is an unsigned long.  It becomes -1L in the resource module.

I'm attaching an updated patch that uses PyModule_AddObject
and applies cleanly to the current CVS.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-12-13 15:43

Message:
Logged In: YES 
user_id=31392

I'd rather see this go through a beta release where we can
verify that it works for both the LONG_LONG and
non-LONG_LONG cases.

Among other things, it looks possible (though probably
unlikely) that there are platforms that do not have long
long and do not representation rlim_t as long.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-12 00:24

Message:
Logged In: YES 
user_id=6380

Jeremy, please review and apply or reject (or postpone and
lower priority).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 12:06:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 04:06:19 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16p6lP-0008Gb-00@usw-sf-web4.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Eduardo Pérez (eperez)
>Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 12:17:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 04:17:10 -0800
Subject: [Patches] [ python-Patches-501713 ] compileall.py -d errors
Message-ID: <E16p6vu-0005JI-00@usw-sf-web3.sourceforge.net>

Patches item #501713, was opened at 2002-01-10 05:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470

>Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Bastian Kleineidam (calvin)
Assigned to: Guido van Rossum (gvanrossum)
Summary: compileall.py -d errors

Initial Comment:
the option -d is not handled properly, the
compileall.py script generates files in the
wrong directory.

Patch is for Python 2.1.1.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:17

Message:
Logged In: YES 
user_id=6380

Good question. The patch is bogus, it turns out! Bastian
didn't understand -d either. The patch changes the semantics
of the -d option.

What -d is *supposed* to do (and what it does without the
patch) is to lie about the filename embedded in code
objects. I think the use case is a setup Bill Janssen at
Xerox PARC described: they mount a shared lib directory as
e.g. /shared/local/lib/python2.2/, which is read-only;
there's a different pathname for it that's only accessible
on the server machine, e.g. /writable/local/lib/python2.2/.
When compiling the modules, they write the .pyc and .pyo
files in the /writable/ mounted filesystem, but they want
the co_filename attribute of the code to start with
/shared/. The -d option lets them do this by saying

  compileall -d /shared/local/lib/python2.2/ 
/writable/local/lib/python2.2/

Bastian's patch changes the -d option to make te -d argument
the destination where the .pyc files are written, which
would defeat the purpose.

Bastian, if you want a way to change the destination
directory (which would be a useful feature too), please
submit a new patch. The -o option seems to make sense to
specify the output directory.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:54

Message:
Logged In: YES 
user_id=35752

Attached is an updated version of the patch that cleanly
applies to the current CVS tree.  I can't figure out what
the -d option is supposed to do however.  The documentation
says "-d destdir: purported directory name for error messages
if no directory arguments, -l sys.path is assumed".  What
does that mean?  Assigning to Guido since it looks like he
added the -d option.

----------------------------------------------------------------------

Comment By: Bastian Kleineidam (calvin)
Date: 2002-01-17 11:49

Message:
Logged In: YES 
user_id=9205

I updated the patch to correct the case where dfile is None

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 13:52:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 05:52:25 -0800
Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation
Message-ID: <E16p8Q5-0002qs-00@usw-sf-web4.sourceforge.net>

Patches item #534304, was opened at 2002-03-24 22:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470

Category: Parser/Compiler
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: SUZUKI Hisao (suzuki_hisao)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 263 phase 2 Implementation

Initial Comment:
This is a sample implementation of PEP 263 phase 2.

This implementation behaves just as normal Python does
if no other coding hints are given.  Thus it does not
hurt anyone who uses Python now.  Note that it is
strictly compatible with the PEP in that every program
valid in the PEP is also valid in this implementation.

This implementation also accepts files in UTF-16 with
BOM.  They are read as UTF-8 internally.  Please try
"utf16sample.py" included.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 15:12:25 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 07:12:25 -0800
Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups
Message-ID: <E16p9fV-0000Vc-00@usw-sf-web3.sourceforge.net>

Patches item #502415, was opened at 2002-01-11 18:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Zooko O'Whielacronx (zooko)
Assigned to: Nobody/Anonymous (nobody)
Summary: optimize attribute lookups

Initial Comment:
This patch optimizes the string comparisons in
class_getattr(), class_setattr(), instance_getattr1(),
and instance_setattr().

I pulled out the relevant section of class_setattr()
and measured its performance, yielding the following
results:

 * in the case that the argument does *not* begin with
"__", then the new version is 1.03 times as fast as the
old.  (This is a mystery to me, as the path through the
code looks the same, in C.  I examined the assembly
that GCC v3.0.3 generated in -O3 mode, and it is true
that the assembly for the new version is
smaller/faster, although I don't really understand why.)

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "X_" (where X is a random alphabetic
character), then the new version 1.12 times as fast as
the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and does *not* end with "_", then the new version
1.16 times as fast as the old.

 * in the case that the argument is (randomly) one of
the six special names, then the new version is 2.7
times as fast as the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "__" (but is not one of the six
special names), then the new version is 3.7 times as
fast as the old.


----------------------------------------------------------------------

>Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-24 15:12

Message:
Logged In: YES 
user_id=52562

Okay, I just want to double-check these two points:

1.  You did look at the actual resulting source code and not
just the patch, right?  Here's a side-by-side:
http://zooko.com/temp.html

2.  You realize that my promise that the actual speedup is <
5% is in a realistic application-level benchmark.  For
microbenchmarks, the speed-up is various but generally much
higher than 5%, as described in this patch tracker entry.

Given these two facts, then please reject this patch and
spend your time on the new cached attribute lookups
architecture instead.  ;-)

Regards,

Zooko


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:57

Message:
Logged In: YES 
user_id=35752

Based on the complexity added by the patch I would say
at least a 5% speedup would be needed to offset the
maintainence cost.  -1 on the current patch.

----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-14 16:24

Message:
Logged In: YES 
user_id=52562

update:

I did a real app benchmark of this patch by running one of
the unit tests from 
PyXML-0.6.6.  (Which one?  The one that I guessed would
favor my optimization 
the most.  Unfortunately I've lost my notes and I don't
remember which one.)

I also separated out the "unroll strcmp" optimization from
the "use macros" 
optimization on request.

I have lost my notes, but I recall that my results showed
what I expected: 
between 0.5 and 3 percent app-level speed-up for the unroll
strcmp optimization.

Interesting detail: a quirk in GCC 3 makes the unroll strcmp
version is slightly 
faster than the current strcmp version *even* in the
(common) case that the 
first two characters of the attribute name are *not* '__'.

What should happen next:

1.  Someone who has the authority to approve or reject this
patch should tell me 
what kind of benchmark would be persuasive to you.  I mean:
what specific 
program I can run with and without my patch for a useful
comparison.  (If you 
require more than a 5% app-level speed-up, then let's give
up on this patch now!)

2.  Someone volunteer to test this patch with MSFT compiler,
as I don't have one 
right now.  Some people are still using the Windows
platform, I've noticed [1], 
so it is worth benchmarking.  Actually, someone should
volunteer to benchmark 
GCC+Linux-or-MacOSX, too, as my computer is a laptop with
variable-speed CPU and 
is really crummy for benchmarking.

By the way, PEP 266 is a better solution to the problem but
until it's 
implemented, this patch is the better patch.  ;-)

Note: this is one of those patches that looks uglier in
"diff -u" format than in 
actual source code.  Please browse the actual source
side-by-side [2] to see how 
ugly it really is.

Regards

Zooko

[1] http://www.google.com/press/zeitgeist/jan02-pie.gif
[2] search for "class_getattr" in:
    http://zooko.com/classobject.c
    http://zooko.com/classobject-strcmpunroll.c

---
                 zooko.com
Security and Distributed Systems Engineering
---


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-18 00:22

Message:
Logged In: YES 
user_id=52562

Okay I've done some "mini benchmarks".  The earlier reported
micro-benchmarks were the result of running the inner loop
itself, in C.  These mini benchmarks are the result of
running this Python script:

class A:
    def __init__(self):
        self.a = 0

a = A()
for i in xrange(2**20):
    a.a = i

print a.a

and then using different attribute names in place of `a'.
The results are as expected: the optimized version is faster
than the current one, depending on the shape of the
attribute name, and dampened by the fact that there is now
other work being done.  The case that shows the smallest
difference is when the attribute name neither begins nor
ends with an '_'.  In that case the above script runs about
2% faster with the optimizations.  The case that shows the
biggest difference is when the attribute begins and ends
with '__', as in `__a__'.  Then the above script runs about
15% faster.

This still isn't a *real* application benchmark.  I'm
looking for one that is a reasonable case for real Python
users but that also uses attribute lookups heavily.


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-17 20:33

Message:
Logged In: YES 
user_id=52562

Yeah, the optimized version is less readable that the original.

I'll try to come up with a benchmark application.  Any
ideas?  Maybe some unit tests from Zope that use attribute
lookups heavily?

My guess is that the actual results in an application will
be "marginal", like maybe between 0.5% to 3% improvement.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:29

Message:
Logged In: YES 
user_id=31392

This seems to add a lot of complexity for a few special
cases.  How important are these particular attributes?  Do
you have any benchmark applications that show real
improvement?  It seems like microbenchmarks overstate the
benefit, since we don't know how often these attributes are
looked up by most applications.

It would also be interesting to see how much of the benefit
for non __ names is the result of the PyString_AS_STRING()
macro.  Maybe that's all the change we really need :-).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 15:37:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 07:37:18 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pA3a-0003mL-00@usw-sf-web4.sourceforge.net>

Patches item #497736, was opened at 2001-12-30 01:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 12:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-30 02:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 18:25:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 10:25:14 -0800
Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups
Message-ID: <E16pCg6-0001zD-00@usw-sf-web3.sourceforge.net>

Patches item #502415, was opened at 2002-01-11 18:07
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Zooko O'Whielacronx (zooko)
Assigned to: Nobody/Anonymous (nobody)
Summary: optimize attribute lookups

Initial Comment:
This patch optimizes the string comparisons in
class_getattr(), class_setattr(), instance_getattr1(),
and instance_setattr().

I pulled out the relevant section of class_setattr()
and measured its performance, yielding the following
results:

 * in the case that the argument does *not* begin with
"__", then the new version is 1.03 times as fast as the
old.  (This is a mystery to me, as the path through the
code looks the same, in C.  I examined the assembly
that GCC v3.0.3 generated in -O3 mode, and it is true
that the assembly for the new version is
smaller/faster, although I don't really understand why.)

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "X_" (where X is a random alphabetic
character), then the new version 1.12 times as fast as
the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and does *not* end with "_", then the new version
1.16 times as fast as the old.

 * in the case that the argument is (randomly) one of
the six special names, then the new version is 2.7
times as fast as the old.

 * in the case that the argument is a string of random
length between 1 and 19 inclusive, and it begins with
"__" and ends with "__" (but is not one of the six
special names), then the new version is 3.7 times as
fast as the old.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:25

Message:
Logged In: YES 
user_id=35752

I've played with your patch for about 2 hours today.  I
benchmarked it, tried to clean it up using macros or
inlined functions.  I also tried a varation that exploited
the fact that most names were interned strings.  It's not
worth it.  Spend time on rattlesnake, pysco, or the
namespace optimizations.

----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-24 15:12

Message:
Logged In: YES 
user_id=52562

Okay, I just want to double-check these two points:

1.  You did look at the actual resulting source code and not
just the patch, right?  Here's a side-by-side:
http://zooko.com/temp.html

2.  You realize that my promise that the actual speedup is <
5% is in a realistic application-level benchmark.  For
microbenchmarks, the speed-up is various but generally much
higher than 5%, as described in this patch tracker entry.

Given these two facts, then please reject this patch and
spend your time on the new cached attribute lookups
architecture instead.  ;-)

Regards,

Zooko


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:57

Message:
Logged In: YES 
user_id=35752

Based on the complexity added by the patch I would say
at least a 5% speedup would be needed to offset the
maintainence cost.  -1 on the current patch.

----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-03-14 16:24

Message:
Logged In: YES 
user_id=52562

update:

I did a real app benchmark of this patch by running one of
the unit tests from 
PyXML-0.6.6.  (Which one?  The one that I guessed would
favor my optimization 
the most.  Unfortunately I've lost my notes and I don't
remember which one.)

I also separated out the "unroll strcmp" optimization from
the "use macros" 
optimization on request.

I have lost my notes, but I recall that my results showed
what I expected: 
between 0.5 and 3 percent app-level speed-up for the unroll
strcmp optimization.

Interesting detail: a quirk in GCC 3 makes the unroll strcmp
version is slightly 
faster than the current strcmp version *even* in the
(common) case that the 
first two characters of the attribute name are *not* '__'.

What should happen next:

1.  Someone who has the authority to approve or reject this
patch should tell me 
what kind of benchmark would be persuasive to you.  I mean:
what specific 
program I can run with and without my patch for a useful
comparison.  (If you 
require more than a 5% app-level speed-up, then let's give
up on this patch now!)

2.  Someone volunteer to test this patch with MSFT compiler,
as I don't have one 
right now.  Some people are still using the Windows
platform, I've noticed [1], 
so it is worth benchmarking.  Actually, someone should
volunteer to benchmark 
GCC+Linux-or-MacOSX, too, as my computer is a laptop with
variable-speed CPU and 
is really crummy for benchmarking.

By the way, PEP 266 is a better solution to the problem but
until it's 
implemented, this patch is the better patch.  ;-)

Note: this is one of those patches that looks uglier in
"diff -u" format than in 
actual source code.  Please browse the actual source
side-by-side [2] to see how 
ugly it really is.

Regards

Zooko

[1] http://www.google.com/press/zeitgeist/jan02-pie.gif
[2] search for "class_getattr" in:
    http://zooko.com/classobject.c
    http://zooko.com/classobject-strcmpunroll.c

---
                 zooko.com
Security and Distributed Systems Engineering
---


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-18 00:22

Message:
Logged In: YES 
user_id=52562

Okay I've done some "mini benchmarks".  The earlier reported
micro-benchmarks were the result of running the inner loop
itself, in C.  These mini benchmarks are the result of
running this Python script:

class A:
    def __init__(self):
        self.a = 0

a = A()
for i in xrange(2**20):
    a.a = i

print a.a

and then using different attribute names in place of `a'.
The results are as expected: the optimized version is faster
than the current one, depending on the shape of the
attribute name, and dampened by the fact that there is now
other work being done.  The case that shows the smallest
difference is when the attribute name neither begins nor
ends with an '_'.  In that case the above script runs about
2% faster with the optimizations.  The case that shows the
biggest difference is when the attribute begins and ends
with '__', as in `__a__'.  Then the above script runs about
15% faster.

This still isn't a *real* application benchmark.  I'm
looking for one that is a reasonable case for real Python
users but that also uses attribute lookups heavily.


----------------------------------------------------------------------

Comment By: Zooko O'Whielacronx (zooko)
Date: 2002-01-17 20:33

Message:
Logged In: YES 
user_id=52562

Yeah, the optimized version is less readable that the original.

I'll try to come up with a benchmark application.  Any
ideas?  Maybe some unit tests from Zope that use attribute
lookups heavily?

My guess is that the actual results in an application will
be "marginal", like maybe between 0.5% to 3% improvement.


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-01-17 18:29

Message:
Logged In: YES 
user_id=31392

This seems to add a lot of complexity for a few special
cases.  How important are these particular attributes?  Do
you have any benchmark applications that show real
improvement?  It seems like microbenchmarks overstate the
benefit, since we don't know how often these attributes are
looked up by most applications.

It would also be interesting to see how much of the benefit
for non __ names is the result of the PyString_AS_STRING()
macro.  Maybe that's all the change we really need :-).


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 18:39:20 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 10:39:20 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pCtk-0004rU-00@usw-sf-web2.sourceforge.net>

Patches item #497736, was opened at 2001-12-30 01:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Open
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 18:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 12:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-30 02:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 21:51:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 13:51:57 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pFu9-0003UM-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-30 01:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 21:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 18:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 12:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-30 02:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 22:05:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 14:05:02 -0800
Subject: [Patches] [ python-Patches-533008 ] specifying headers for extensions
Message-ID: <E16pG6o-0003aZ-00@usw-sf-web3.sourceforge.net>

Patches item #533008, was opened at 2002-03-21 06:09
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470

Category: Distutils and setup.py
Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Thomas Heller (theller)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: specifying headers for extensions

Initial Comment:
This patch allows to specify that C header files are 
part of source files for dependency checking. 
The 'sources' list in Extension instances can be 
simple filenames as before, but they can also be 
SourceFile instances created by

SourceFile("myfile.c", headers=["inc1.h", "inc2.h"]).

Unfortunately not only changes to command.build_ext 
and command.build_clib had to be made, also all the 
ccompiler (sub)classes have to be changed because the 
ccompiler does the actual dependency checking. I 
updated all the ccompiler subclasses except 
mwerkscompiler.py, but only msvccompiler has actually 
been tested.

The argument list which dep_util.newer_pairwise() now 
accepts has changed, the first arg must now be a 
sequence of SourceFile instances. This may be 
problematic, better would IMO be to move this function 
(with a new name?) into ccompiler.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 17:05

Message:
Logged In: YES 
user_id=6380

Why is this priority 7??????

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 22:28:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 14:28:12 -0800
Subject: [Patches] [ python-Patches-489066 ] Include RLIM_INFINITY constant
Message-ID: <E16pGTE-0006UH-00@usw-sf-web2.sourceforge.net>

Patches item #489066, was opened at 2001-12-04 20:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470

Category: Modules
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Eric Huss (ehuss)
Assigned to: Neil Schemenauer (nascheme)
Summary: Include RLIM_INFINITY constant

Initial Comment:
The following is a patch to the resource module to 
include the RLIM_INFINITY constant.  It should handle 
platforms where RLIM_INFINITY is not a LONG_LONG, but 
I have no means to test that.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 22:28

Message:
Logged In: YES 
user_id=35752

Checked in as resource.c 2.23.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 12:02

Message:
Logged In: YES 
user_id=6380

Comments:

(1) RLIM_INFINITY is used unconditionally elsewhere in the
module, so the #ifdef is unnecessary.

(2) The extra #if/#endif around the closing curly is ugly.
I'd avoid this by moving the corresponding opening curly
outside the first block.

(3) resource.RLIM_INFINITY is -1 on my system too. But does
that matter? This is just a symbolic constant to be used to
set limits to infinit, and if it happens to be -1, who
cares? It's got 32 1-bits, which is what counts.

So I'd accept it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 00:25

Message:
Logged In: YES 
user_id=35752

This doesn't seem to work on my Linux machine.  RLIM_INFINITY
is an unsigned long.  It becomes -1L in the resource module.

I'm attaching an updated patch that uses PyModule_AddObject
and applies cleanly to the current CVS.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-12-13 20:43

Message:
Logged In: YES 
user_id=31392

I'd rather see this go through a beta release where we can
verify that it works for both the LONG_LONG and
non-LONG_LONG cases.

Among other things, it looks possible (though probably
unlikely) that there are platforms that do not have long
long and do not representation rlim_t as long.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-12 05:24

Message:
Logged In: YES 
user_id=6380

Jeremy, please review and apply or reject (or postpone and
lower priority).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 22:35:26 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 14:35:26 -0800
Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip)
Message-ID: <E16pGaE-0003PY-00@usw-sf-web1.sourceforge.net>

Patches item #533482, was opened at 2002-03-22 08:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Todd Warner (icode)
Assigned to: Neil Schemenauer (nascheme)
Summary: small seek tweak upon reads (gzip)

Initial Comment:
Upon actual read of a gzipped file, there is a check
to see if you are already at the end of the file. This
is done by saving your position, seeking to the end,
and comparing that tell(). It is more efficient to
simply increment position + 1.

Efficiency gain is nearly insignificant, but this
patch will greatly decrease the size of my next one. :)

NOTE: all version of gzip.py do this.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 22:35

Message:
Logged In: YES 
user_id=35752

This looks like a pointless change to me.  It's probably
less efficient with the patch because there is an extra
Python int add.  Why don't you just submit the real patch? :)

Rejected.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 23:01:15 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 15:01:15 -0800
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E16pGzD-00076g-00@usw-sf-web4.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 16:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Nobody/Anonymous (nobody)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 15:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 14:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 23:13:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 15:13:29 -0800
Subject: [Patches] [ python-Patches-522587 ] Fixes pydoc http/ftp URL matching
Message-ID: <E16pHB3-0003pY-00@usw-sf-web1.sourceforge.net>

Patches item #522587, was opened at 2002-02-25 18:50
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Neil Schemenauer (nascheme)
Summary: Fixes pydoc http/ftp URL matching

Initial Comment:
The current URL matching pattern used by pydoc only 
excludes whitespace. My patch also excludes the 
following characters:

' & " - excludes the quotes in: <a href="...">
< & > - As stated in RFC-1738:

"""The characters "<" and ">" are unsafe because they 
are used as the delimiters around URLs in free text"""

We don't want to include the delimeters as part of the 
URL. And including unescaped "<" in an attribute value 
is not legal markup.

Also, remove the word boundary requirement for 
http/ftp URIs because otherwise the "/" would not be 
included in the following URL: "http://www.python.org/"

Attached is the patch and some simple test code.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 23:13

Message:
Logged In: YES 
user_id=35752

Fixed in pydoc 1.60.  I dropped the trailing \b.  Instead
of restricting the characters in the URL I changed the
code to properly quote it.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470


From noreply@sourceforge.net  Sun Mar 24 23:15:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 15:15:22 -0800
Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)
Message-ID: <E16pHCs-0004H3-00@usw-sf-web3.sourceforge.net>

Patches item #474274, was opened at 2001-10-23 23:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Nobody/Anonymous (nobody)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 23:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 23:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 22:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 01:21:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 17:21:36 -0800
Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip)
Message-ID: <E16pJB2-0005cu-00@usw-sf-web3.sourceforge.net>

Patches item #533482, was opened at 2002-03-22 03:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Todd Warner (icode)
Assigned to: Neil Schemenauer (nascheme)
Summary: small seek tweak upon reads (gzip)

Initial Comment:
Upon actual read of a gzipped file, there is a check
to see if you are already at the end of the file. This
is done by saving your position, seeking to the end,
and comparing that tell(). It is more efficient to
simply increment position + 1.

Efficiency gain is nearly insignificant, but this
patch will greatly decrease the size of my next one. :)

NOTE: all version of gzip.py do this.

----------------------------------------------------------------------

>Comment By: Todd Warner (icode)
Date: 2002-03-24 20:21

Message:
Logged In: YES 
user_id=87721

It is more efficient for the majority of gzipped files
(if very small files are not in the majority).

The "real" patch will be (once I give it a bit more
polish/tuning --- using in production code soon) a class
called GzipStream. Ie. it will allow high level access to
any arbitrary file-like "stream" (eg. a gzipped
socket stream) which are not generally "seekable". I do
this via inheriting GzipFile and extending upon it...
but I rewrite the _read method with a one line change.

Anyway, that is my logic. Question to you: should this be
included within gzip.py or in its own module (eg. 
gzipstream)?


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 17:35

Message:
Logged In: YES 
user_id=35752

This looks like a pointless change to me.  It's probably
less efficient with the patch because there is an extra
Python int add.  Why don't you just submit the real patch? :)

Rejected.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 01:30:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 17:30:50 -0800
Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip)
Message-ID: <E16pJJy-0005in-00@usw-sf-web3.sourceforge.net>

Patches item #533482, was opened at 2002-03-22 03:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Todd Warner (icode)
Assigned to: Neil Schemenauer (nascheme)
Summary: small seek tweak upon reads (gzip)

Initial Comment:
Upon actual read of a gzipped file, there is a check
to see if you are already at the end of the file. This
is done by saving your position, seeking to the end,
and comparing that tell(). It is more efficient to
simply increment position + 1.

Efficiency gain is nearly insignificant, but this
patch will greatly decrease the size of my next one. :)

NOTE: all version of gzip.py do this.

----------------------------------------------------------------------

>Comment By: Todd Warner (icode)
Date: 2002-03-24 20:30

Message:
Logged In: YES 
user_id=87721

It is more efficient for the majority of gzipped files
(if very small files are not in the majority).

The "real" patch will be (once I give it a bit more
polish/tuning --- using in production code soon) a class
called GzipStream. Ie. it will allow high level access to
any arbitrary file-like "stream" (eg. a gzipped
socket stream) which are not generally "seekable". I do
this via inheriting GzipFile and extending upon it...
but I rewrite the _read method with a one line change.

Anyway, that is my logic. Question to you: should this be
included within gzip.py or in its own module (eg. 
gzipstream)?


----------------------------------------------------------------------

Comment By: Todd Warner (icode)
Date: 2002-03-24 20:21

Message:
Logged In: YES 
user_id=87721

It is more efficient for the majority of gzipped files
(if very small files are not in the majority).

The "real" patch will be (once I give it a bit more
polish/tuning --- using in production code soon) a class
called GzipStream. Ie. it will allow high level access to
any arbitrary file-like "stream" (eg. a gzipped
socket stream) which are not generally "seekable". I do
this via inheriting GzipFile and extending upon it...
but I rewrite the _read method with a one line change.

Anyway, that is my logic. Question to you: should this be
included within gzip.py or in its own module (eg. 
gzipstream)?


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 17:35

Message:
Logged In: YES 
user_id=35752

This looks like a pointless change to me.  It's probably
less efficient with the patch because there is an extra
Python int add.  Why don't you just submit the real patch? :)

Rejected.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 03:33:44 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 19:33:44 -0800
Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip)
Message-ID: <E16pLEu-0001WO-00@usw-sf-web4.sourceforge.net>

Patches item #533482, was opened at 2002-03-22 08:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Todd Warner (icode)
Assigned to: Neil Schemenauer (nascheme)
Summary: small seek tweak upon reads (gzip)

Initial Comment:
Upon actual read of a gzipped file, there is a check
to see if you are already at the end of the file. This
is done by saving your position, seeking to the end,
and comparing that tell(). It is more efficient to
simply increment position + 1.

Efficiency gain is nearly insignificant, but this
patch will greatly decrease the size of my next one. :)

NOTE: all version of gzip.py do this.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 03:33

Message:
Logged In: YES 
user_id=35752

Why would it be more efficient?  Assuming the OS is not
implemented by a silly person, a seek just updates
an offset in the in-memory file descriptor structure.

Regarding your GzipStream, it sounds like making it part
of gzip.py would be okay.

----------------------------------------------------------------------

Comment By: Todd Warner (icode)
Date: 2002-03-25 01:30

Message:
Logged In: YES 
user_id=87721

It is more efficient for the majority of gzipped files
(if very small files are not in the majority).

The "real" patch will be (once I give it a bit more
polish/tuning --- using in production code soon) a class
called GzipStream. Ie. it will allow high level access to
any arbitrary file-like "stream" (eg. a gzipped
socket stream) which are not generally "seekable". I do
this via inheriting GzipFile and extending upon it...
but I rewrite the _read method with a one line change.

Anyway, that is my logic. Question to you: should this be
included within gzip.py or in its own module (eg. 
gzipstream)?


----------------------------------------------------------------------

Comment By: Todd Warner (icode)
Date: 2002-03-25 01:21

Message:
Logged In: YES 
user_id=87721

It is more efficient for the majority of gzipped files
(if very small files are not in the majority).

The "real" patch will be (once I give it a bit more
polish/tuning --- using in production code soon) a class
called GzipStream. Ie. it will allow high level access to
any arbitrary file-like "stream" (eg. a gzipped
socket stream) which are not generally "seekable". I do
this via inheriting GzipFile and extending upon it...
but I rewrite the _read method with a one line change.

Anyway, that is my logic. Question to you: should this be
included within gzip.py or in its own module (eg. 
gzipstream)?


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 22:35

Message:
Logged In: YES 
user_id=35752

This looks like a pointless change to me.  It's probably
less efficient with the patch because there is an extra
Python int add.  Why don't you just submit the real patch? :)

Rejected.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 04:00:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 20:00:19 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pLed-0007Go-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 04:35:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 20:35:32 -0800
Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput
Message-ID: <E16pMCi-0001qL-00@usw-sf-web2.sourceforge.net>

Patches item #516297, was opened at 2002-02-12 03:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neil Schemenauer (nascheme)
Summary: iterator for lineinput

Initial Comment:
Taking the route of least evasiveness, I have come up with
a VERY simple iterator interface for fileinput.

Basically, __iter__() returns self and next() calls
__getitem__() with the proper number.  This was done to
have the patch only add methods and not change any
existing ones, thus minimizing any chance of breaking
existing code.

Now the module on the whole, however, could possibly
stand an update now that generators are coming.  I have
a recipe up at the Cookbook that uses generators to
implement fileinput w/o in-place editing
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506).
 If there is enough interest, I would be quite willing
to rewrite fileinput using generators.  And if some of
the unneeded methods could be deprecated (__getitem__,
readline), then the whole module could probably be
cleaned up a decent amount and have a possible speed
improvement.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 04:35

Message:
Logged In: YES 
user_id=35752

Why do you need fileinput to have a __iter__ method?  As
far as I can see it only slows things down.  As it is now
iter(fileinput.input()) works just fine.  Adding __iter__
and next() just add another layer of method calls.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 06:42:34 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 24 Mar 2002 22:42:34 -0800
Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin
Message-ID: <E16pOBe-0003Oz-00@usw-sf-web4.sourceforge.net>

Patches item #533681, was opened at 2002-03-22 12:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Gerald S. Williams (gsw_agere)
Assigned to: Nobody/Anonymous (nobody)
Summary: Apply semaphore code to Cygwin

Initial Comment:
The current version of Cygwin does not define 
_POSIX_SEMAPHORES by default, although requires the 
new semaphore interface since its condition variables 
interface contains a race condition.

This patch simply specifies that semaphores should be 
used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-25 01:42

Message:
Logged In: YES 
user_id=31435

I'm rejecting the patch based on Jason Tishler's comments 
in:

http://mail.python.org/pipermail/python-dev/2002-
March/021675.html

Please work with Jason to find a better solution.  If you 
and Jason can't find a better one, and Jason goes along 
with this patch, we can reopen it.

In the meantime, you motivated me to get rid of the old 
__ksr__ cruft <wink>.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-22 16:28

Message:
Logged In: YES 
user_id=31435

I'm afraid I agree with Martin here:  the crusty old 
historical examples you dug up are exactly why we avoid 
doing similar stuff now.  Nobody understands why that code 
is there anymore, and it will never go away.  For example, 
I happen to know that KSR went bankrupt in 1994, and 
anything keying off __ksr__ has been worse than useless 
since then.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 16:19

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Gerald S. Williams (gsw_agere)
Date: 2002-03-22 16:18

Message:
Logged In: YES 
user_id=329402

Before _POSIX_SEMAPHORES is specified by default for 
Cygwin, it will probably have to be shown that it is 100% 
compliant with POSIX. Whether or not this is the case, the 
POSIX semaphore implementation is the one that should be 
used for Cygwin (it has been verified and approved by the 
Cygwin Python maintainer, etc.).

Prior to this, threading had been disabled for Cygwin 
Python, so this is really more of a port-to-Cygwin than a 
workaround. This could have been implemented in a new file 
(thread_cygwin.h), although during implementation it was 
discovered that the change for Cygwin would also benefit 
POSIX semaphore users in general.

The threading module overall is highly platform-specific, 
especially with regard to redefining POSIX symbols for 
specific platforms. In particular, this is done for the 
following platforms:
 __DGUX
 __sgi
 __ksr__
 anything using SOLARIS_THREADS
 __MWERKS__

However, except for those using SOLARIS_THREADS, these are 
specified in thread.c. I will therefore resubmit the patch 
as a change to thread.c instead.

The reference to __rtems__ actually comes from newlib, 
which Cygwin uses. It doesn't apply to Cygwin.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-22 15:03

Message:
Logged In: YES 
user_id=21627

-1. Cygwin really ought to define _POSIX_SEMAPHORES if they
support them, so if they support them and don't define the
feature test macro, it is a Cygwin bug. Work-arounds around
platform bugs are generally discourgaged in Python.

On python-dev, you indicate that _POSIX_SEMPAPHORES is only
defined if __rtems__ is also defined. What is the rationale
for that?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 09:03:05 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 01:03:05 -0800
Subject: [Patches] [ python-Patches-533008 ] specifying headers for extensions
Message-ID: <E16pQNd-0004t2-00@usw-sf-web4.sourceforge.net>

Patches item #533008, was opened at 2002-03-21 12:09
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470

Category: Distutils and setup.py
Group: Python 2.3
Status: Open
Resolution: None
Priority: 7
Submitted By: Thomas Heller (theller)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: specifying headers for extensions

Initial Comment:
This patch allows to specify that C header files are 
part of source files for dependency checking. 
The 'sources' list in Extension instances can be 
simple filenames as before, but they can also be 
SourceFile instances created by

SourceFile("myfile.c", headers=["inc1.h", "inc2.h"]).

Unfortunately not only changes to command.build_ext 
and command.build_clib had to be made, also all the 
ccompiler (sub)classes have to be changed because the 
ccompiler does the actual dependency checking. I 
updated all the ccompiler subclasses except 
mwerkscompiler.py, but only msvccompiler has actually 
been tested.

The argument list which dep_util.newer_pairwise() now 
accepts has changed, the first arg must now be a 
sequence of SourceFile instances. This may be 
problematic, better would IMO be to move this function 
(with a new name?) into ccompiler.

----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2002-03-25 10:03

Message:
Logged In: YES 
user_id=11105

Fred requested it this way:
http://mail.python.org/pipermail/distutils-sig/2002-
March/002806.html

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 23:05

Message:
Logged In: YES 
user_id=6380

Why is this priority 7??????

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 12:27:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 04:27:49 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16pTZl-0007G5-00@usw-sf-web4.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-25 13:27

Message:
Logged In: YES 
user_id=88611

I have rebuilt the patch against CVS
 - --enable-shared instead of --enable-shared-python
 - sets rpath on Linux and Tru64 too
 - I didn't change the SOVERSION stuff. I think we should
come to a conclusion with versioning first. BTW: am I
correct that make install should create the symlink .sl ->
.sl.1.0 when we use versioning? 
 - this patch may break BeOS and DgUX. I think somone with
access to these platforms should test it (he should use
--enable-shared, as this patch changes the default behavior
to --disable-shared for all platforms).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:41

Message:
Logged In: YES 
user_id=21627

The API version is maintained in modsupport.h:API_VERSION.

I'm personally not concerned about breakage of API during
the development of a new release. Absolutely no breakage
should occur in maintenance releases. After all, a
maintenance will replace pythonxy.dll on Windows with no
protection against API breakage, thus, it is a bug if the
API changes in a maintenace release.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 18:14

Message:
Logged In: YES 
user_id=10327

This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes.

That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-)

This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python.  :-(

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 18:05

Message:
Logged In: YES 
user_id=21627

The CVS version will usually use a completely different
library name (e.g. libpython23.so), so there will be no
conflicts with prior versions.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 16:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 15:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 12:40:55 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 04:40:55 -0800
Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs
Message-ID: <E16pTmR-00079Y-00@usw-sf-web2.sourceforge.net>

Patches item #514997, was opened at 2002-02-08 16:22
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470

Category: Parser/Compiler
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 3
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Neil Schemenauer (nascheme)
Summary: remove extra SET_LINENOs

Initial Comment:
This patch removes consecutive SET_LINENOs.
The patch fixes test_hotspot, but does not fix
a failure in inspect.  I wasn't sure what
was the problem was or why SET_LINENO would
matter for inspect.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-25 07:40

Message:
Logged In: YES 
user_id=33168

I'm rejected this patch because it would take a lot of work
to get this patch to the point where it would be good enough
for inclusion.

Now to answer Tim's questions.  Tabs vs spaces:  depends on
the day.  I use both emacs & vi, emacs does convert to
spaces.  But I must have screwed something up.

0xffff was only a hack to not deal with line numbers > 2
**16.  I was going for bang for the buck.  I agree it would
be best to remove this limitation.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-24 00:29

Message:
Logged In: YES 
user_id=31435

Neal, do you have your editor set to insert spaces instead 
of tabs, and to consider "a tab" to be four spaces?  Guido 
wrote this file using hard tabs considered as 8-space 
gimmicks, and the after-patch code is kinda gruesome due to 
the mixture of indentation styles.

Second, why do you think a hard-coded 0xffff is something 
interesting for line numbers?  Or are you just giving up 
when line numbers are >= 2**16?  The code is mysterious 
here and needs a comment.  It's probably not good to leave 
the code in a state where adjacent SET_LINENOs are 
collapsed if and only if the line numbers "aren't big" 
(then code using line numbers can't guess whether they are 
or aren't collapsed without duplicating the same lumpy 
logic).

Third, c_lnotab is extremely delicate, historically subject 
to miserable rare bugs.  If you've read the long comment 
block explaining it near the top of this file, I'd 
appreciate an argument (in code comments more than here 
<wink>) for why just mucking with the last pair in a 
sequence of offset pairs can't break the subtle correctness 
property explained in the comment block.

Finally, it's definitely worth tracking down why 
test_inspect fails:  that test is difficult to understand, 
but the bottom line is that it's provoking an exception 
traceback and asserting that the computed line numbers 
correspond to the actual lines that are failing.  The 
failing case provokes a three-frame traceback, and 2 of the 
3 line numbers are wrong after the patch (the first is off 
by 1, and the third is off by 3; the frame in the middle 
gets the right line number).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-01 17:42

Message:
Logged In: YES 
user_id=6380

Can you find someone interested in answering the inspect
question? Otherwise this patch is stalled...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 12:47:41 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 04:47:41 -0800
Subject: [Patches] [ python-Patches-505826 ] demo warning for expressions w/no effect
Message-ID: <E16pTsz-0007F3-00@usw-sf-web2.sourceforge.net>

Patches item #505826, was opened at 2002-01-19 14:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505826&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: demo warning for expressions w/no effect

Initial Comment:
This patch is not meant to be applied as is.  
It is for discussion purposes.

It modifies the compiler to warn about statements that
have no effect.  It does a printf() when it determines
an expression has no effect.  The sample definition is:
 a POP_TOP preceded by a BINARY_* or a LOAD_* operation.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-25 07:47

Message:
Logged In: YES 
user_id=33168

I don't think this is useful any more.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505826&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 13:23:00 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 05:23:00 -0800
Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation
Message-ID: <E16pURA-0004vX-00@usw-sf-web3.sourceforge.net>

Patches item #534304, was opened at 2002-03-24 14:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470

Category: Parser/Compiler
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: SUZUKI Hisao (suzuki_hisao)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 263 phase 2 Implementation

Initial Comment:
This is a sample implementation of PEP 263 phase 2.

This implementation behaves just as normal Python does
if no other coding hints are given.  Thus it does not
hurt anyone who uses Python now.  Note that it is
strictly compatible with the PEP in that every program
valid in the PEP is also valid in this implementation.

This implementation also accepts files in UTF-16 with
BOM.  They are read as UTF-8 internally.  Please try
"utf16sample.py" included.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-25 14:23

Message:
Logged In: YES 
user_id=21627

The patch looks good, but needs a number of improvements.

1. I have problems building this code. When trying to build
pgen, I get an error message of

Parser/parsetok.c: In function `parsetok':
Parser/parsetok.c:175: `encoding_decl' undeclared

The problem here is that graminit.h hasn't been built yet,
but parsetok refers to the symbol.

2. For some reason, error printing for incorrect encodings
does not work - it appears that it prints the wrong line in
the traceback.

3. The escape processing in Unicode literals is incorrect.
For example, u"\<non-ascii character>" should denote only
the non-ascii character. However, your implementation
replaces the non-ASCII character with \u<hex>, resulting in
\u<hex>, so the first backslash unescapes the second one.

4. I believe the escape processing in byte strings is also
incorrect for encodings that allow \ in the second byte.
Before processing escape characters, you convert back into
the source encoding. If this produces a backslash character,
escape processing will misinterpret that byte as an escape
character.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 14:01:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 06:01:36 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16pV2W-0005PZ-00@usw-sf-web3.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
>Assigned to: Martin v. Löwis (loewis)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-25 15:01

Message:
Logged In: YES 
user_id=21627

I think the remaining issues are shallow only: Few users
will care about --enable-shared on BeOS and DG/UX; those
will hopefully contribute patches. Likewise, for .sl
libraries - I don't know HP-UX shared linking well enough to
determine whether it supports library versions.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-25 13:27

Message:
Logged In: YES 
user_id=88611

I have rebuilt the patch against CVS
 - --enable-shared instead of --enable-shared-python
 - sets rpath on Linux and Tru64 too
 - I didn't change the SOVERSION stuff. I think we should
come to a conclusion with versioning first. BTW: am I
correct that make install should create the symlink .sl ->
.sl.1.0 when we use versioning? 
 - this patch may break BeOS and DgUX. I think somone with
access to these platforms should test it (he should use
--enable-shared, as this patch changes the default behavior
to --disable-shared for all platforms).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:41

Message:
Logged In: YES 
user_id=21627

The API version is maintained in modsupport.h:API_VERSION.

I'm personally not concerned about breakage of API during
the development of a new release. Absolutely no breakage
should occur in maintenance releases. After all, a
maintenance will replace pythonxy.dll on Windows with no
protection against API breakage, thus, it is a bug if the
API changes in a maintenace release.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 18:14

Message:
Logged In: YES 
user_id=10327

This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes.

That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-)

This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python.  :-(

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 18:05

Message:
Logged In: YES 
user_id=21627

The CVS version will usually use a completely different
library name (e.g. libpython23.so), so there will be no
conflicts with prior versions.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 16:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 15:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 15:23:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 07:23:31 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pWJn-0005kC-00@usw-sf-web1.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 16:00:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 08:00:19 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pWtP-0006YE-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 11:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 16:10:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 08:10:59 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pX3j-0006dD-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-30 01:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 16:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 16:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 15:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 04:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 21:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 18:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 12:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-30 02:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 17:16:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 09:16:42 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pY5K-0001ge-00@usw-sf-web4.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:16

Message:
Logged In: YES 
user_id=6380

Neil: coping with a misconfigured server wasn't part of my
scenario; only coping with a client that simply doesn't have
a fqdn was. Some questions remain: (1) why can't we use
localhost.localdomain today? (2) Why is getfqdn() a hack?
(Apart from it being in the wrong module.)

Hm, I just thought of something. Why shouldn't gethostname()
be used as the default? Why bother with getfqdn() at all? At
least when gethostname() returms something inappropriate for
a particular server, it can be fixed locally by root by
fixing the hostname. (This may explain why you think
getfqdn() is a hack.)

Barry: an appropriate API could be to change the default for
local_hostname in __init__ to "localhost.localdomain" but to
leave the code that sticks in socket.getfqdn() (or maybe
just socket.gethostname()) if the value is explicitly given
as None or empty.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 11:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 11:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 17:31:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 09:31:10 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pYJK-0007IU-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-30 01:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 17:31

Message:
Logged In: YES 
user_id=35752

So much discussion for such a little issue. :-)

A misconfigured server must be part of your scenario.  It's
the only case were the hostname makes any difference.  Using
localhost.localdomain will work find on 99.99% of mail
servers.  For the remaining 0.01%, using socket.getfqdn() has
a higher chance of working than using localhost.localdomain.
If socket.getfqdn() can find a hostname that resolves
back to the IP of the client side of the connection then
it works.  Using localhost.localdomain in that case will
not work.

If socket.getfqdn() cannot find the FQDN (due to NAT,
tunnelling or whatever) things work just as well as if
localhost.localdomain was used a default.  Changing the
default to localhost.localdomain fixes nothing!

getfqdn() is a hack because it's relies on DNS. People
always screw that up. :-)

Regarding your suggested API change, I don't see how it
would help.  I doubt any code actually passes
socket.getfqdn() to SMPT.helo().

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 17:16

Message:
Logged In: YES 
user_id=6380

Neil: coping with a misconfigured server wasn't part of my
scenario; only coping with a client that simply doesn't have
a fqdn was. Some questions remain: (1) why can't we use
localhost.localdomain today? (2) Why is getfqdn() a hack?
(Apart from it being in the wrong module.)

Hm, I just thought of something. Why shouldn't gethostname()
be used as the default? Why bother with getfqdn() at all? At
least when gethostname() returms something inappropriate for
a particular server, it can be fixed locally by root by
fixing the hostname. (This may explain why you think
getfqdn() is a hack.)

Barry: an appropriate API could be to change the default for
local_hostname in __init__ to "localhost.localdomain" but to
leave the code that sticks in socket.getfqdn() (or maybe
just socket.gethostname()) if the value is explicitly given
as None or empty.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 16:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 16:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 15:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 04:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 21:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 18:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 15:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 12:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 01:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-30 02:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 17:41:45 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 09:41:45 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pYTZ-0001s4-00@usw-sf-web4.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:41

Message:
Logged In: YES 
user_id=6380

OK. So is socket.gethostname() better than socket.getfqdn()
or not?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 12:31

Message:
Logged In: YES 
user_id=35752

So much discussion for such a little issue. :-)

A misconfigured server must be part of your scenario.  It's
the only case were the hostname makes any difference.  Using
localhost.localdomain will work find on 99.99% of mail
servers.  For the remaining 0.01%, using socket.getfqdn() has
a higher chance of working than using localhost.localdomain.
If socket.getfqdn() can find a hostname that resolves
back to the IP of the client side of the connection then
it works.  Using localhost.localdomain in that case will
not work.

If socket.getfqdn() cannot find the FQDN (due to NAT,
tunnelling or whatever) things work just as well as if
localhost.localdomain was used a default.  Changing the
default to localhost.localdomain fixes nothing!

getfqdn() is a hack because it's relies on DNS. People
always screw that up. :-)

Regarding your suggested API change, I don't see how it
would help.  I doubt any code actually passes
socket.getfqdn() to SMPT.helo().

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:16

Message:
Logged In: YES 
user_id=6380

Neil: coping with a misconfigured server wasn't part of my
scenario; only coping with a client that simply doesn't have
a fqdn was. Some questions remain: (1) why can't we use
localhost.localdomain today? (2) Why is getfqdn() a hack?
(Apart from it being in the wrong module.)

Hm, I just thought of something. Why shouldn't gethostname()
be used as the default? Why bother with getfqdn() at all? At
least when gethostname() returms something inappropriate for
a particular server, it can be fixed locally by root by
fixing the hostname. (This may explain why you think
getfqdn() is a hack.)

Barry: an appropriate API could be to change the default for
local_hostname in __init__ to "localhost.localdomain" but to
leave the code that sticks in socket.getfqdn() (or maybe
just socket.gethostname()) if the value is explicitly given
as None or empty.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 11:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 11:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 18:04:07 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 10:04:07 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pYpD-0007Xj-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 13:04

Message:
Logged In: YES 
user_id=12800

Hold on.  We're conflating issues here.

To address the privacy issue, "localhost.localdomain" should
be used.  I don't see anything else being an appropriate
defense against identity leakage (but IMHO, it's a limited
defense anyway because you'll *always* leak your IP address)

To be "correct" IMO means adhering to RFC 2821 as closely as
is possible.  Which means use the fqdn if available,
otherwise use the domain literal.  See attached patch for that.

If we don't want to be RFC-correct but we want to be liberal
enough to handle misconfigured client systems, then
gethostname() is probably fine, but so would be
localhost.localdomain.

If we want to be robust in the face of overly strict smtp
servers, then I think you're in a losing battle because they
may only accept fqdn's that are reverse resolvable.  But
that may be impossible for the (perhaps misconfigured)
client to calculate.  And if that's the case, then the
client likely has bigger problems.

My preference would be for the default to be RFC-correct
(i.e. fqdn w/domain literal fallback), and allow overrides
via method arguments, as the code with my proposed patch
would implement.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:41

Message:
Logged In: YES 
user_id=6380

OK. So is socket.gethostname() better than socket.getfqdn()
or not?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 12:31

Message:
Logged In: YES 
user_id=35752

So much discussion for such a little issue. :-)

A misconfigured server must be part of your scenario.  It's
the only case were the hostname makes any difference.  Using
localhost.localdomain will work find on 99.99% of mail
servers.  For the remaining 0.01%, using socket.getfqdn() has
a higher chance of working than using localhost.localdomain.
If socket.getfqdn() can find a hostname that resolves
back to the IP of the client side of the connection then
it works.  Using localhost.localdomain in that case will
not work.

If socket.getfqdn() cannot find the FQDN (due to NAT,
tunnelling or whatever) things work just as well as if
localhost.localdomain was used a default.  Changing the
default to localhost.localdomain fixes nothing!

getfqdn() is a hack because it's relies on DNS. People
always screw that up. :-)

Regarding your suggested API change, I don't see how it
would help.  I doubt any code actually passes
socket.getfqdn() to SMPT.helo().

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:16

Message:
Logged In: YES 
user_id=6380

Neil: coping with a misconfigured server wasn't part of my
scenario; only coping with a client that simply doesn't have
a fqdn was. Some questions remain: (1) why can't we use
localhost.localdomain today? (2) Why is getfqdn() a hack?
(Apart from it being in the wrong module.)

Hm, I just thought of something. Why shouldn't gethostname()
be used as the default? Why bother with getfqdn() at all? At
least when gethostname() returms something inappropriate for
a particular server, it can be fixed locally by root by
fixing the hostname. (This may explain why you think
getfqdn() is a hack.)

Barry: an appropriate API could be to change the default for
local_hostname in __init__ to "localhost.localdomain" but to
leave the code that sticks in socket.getfqdn() (or maybe
just socket.gethostname()) if the value is explicitly given
as None or empty.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 11:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 11:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 18:41:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 10:41:54 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pZPm-00080B-00@usw-sf-web3.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 13:41

Message:
Logged In: YES 
user_id=6380

I'm skeptical about the effectiveness of providing overrides
through defaulted arguments; this is something the author of
the program using smtplib must anticipate and give its user
an option to override. (And because of that, I'm at best -0
on adding the local_hostname argument to the constructor, as
Neil checked in.)

I now agree that leaking the fqdn isn't much of a provacy
breach.

I agree that fqdn w/domain literal fallback is the best
compromise.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 13:04

Message:
Logged In: YES 
user_id=12800

Hold on.  We're conflating issues here.

To address the privacy issue, "localhost.localdomain" should
be used.  I don't see anything else being an appropriate
defense against identity leakage (but IMHO, it's a limited
defense anyway because you'll *always* leak your IP address)

To be "correct" IMO means adhering to RFC 2821 as closely as
is possible.  Which means use the fqdn if available,
otherwise use the domain literal.  See attached patch for that.

If we don't want to be RFC-correct but we want to be liberal
enough to handle misconfigured client systems, then
gethostname() is probably fine, but so would be
localhost.localdomain.

If we want to be robust in the face of overly strict smtp
servers, then I think you're in a losing battle because they
may only accept fqdn's that are reverse resolvable.  But
that may be impossible for the (perhaps misconfigured)
client to calculate.  And if that's the case, then the
client likely has bigger problems.

My preference would be for the default to be RFC-correct
(i.e. fqdn w/domain literal fallback), and allow overrides
via method arguments, as the code with my proposed patch
would implement.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:41

Message:
Logged In: YES 
user_id=6380

OK. So is socket.gethostname() better than socket.getfqdn()
or not?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 12:31

Message:
Logged In: YES 
user_id=35752

So much discussion for such a little issue. :-)

A misconfigured server must be part of your scenario.  It's
the only case were the hostname makes any difference.  Using
localhost.localdomain will work find on 99.99% of mail
servers.  For the remaining 0.01%, using socket.getfqdn() has
a higher chance of working than using localhost.localdomain.
If socket.getfqdn() can find a hostname that resolves
back to the IP of the client side of the connection then
it works.  Using localhost.localdomain in that case will
not work.

If socket.getfqdn() cannot find the FQDN (due to NAT,
tunnelling or whatever) things work just as well as if
localhost.localdomain was used a default.  Changing the
default to localhost.localdomain fixes nothing!

getfqdn() is a hack because it's relies on DNS. People
always screw that up. :-)

Regarding your suggested API change, I don't see how it
would help.  I doubt any code actually passes
socket.getfqdn() to SMPT.helo().

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:16

Message:
Logged In: YES 
user_id=6380

Neil: coping with a misconfigured server wasn't part of my
scenario; only coping with a client that simply doesn't have
a fqdn was. Some questions remain: (1) why can't we use
localhost.localdomain today? (2) Why is getfqdn() a hack?
(Apart from it being in the wrong module.)

Hm, I just thought of something. Why shouldn't gethostname()
be used as the default? Why bother with getfqdn() at all? At
least when gethostname() returms something inappropriate for
a particular server, it can be fixed locally by root by
fixing the hostname. (This may explain why you think
getfqdn() is a hack.)

Barry: an appropriate API could be to change the default for
local_hostname in __init__ to "localhost.localdomain" but to
leave the code that sticks in socket.getfqdn() (or maybe
just socket.gethostname()) if the value is explicitly given
as None or empty.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 11:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 11:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 18:56:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 10:56:52 -0800
Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct
Message-ID: <E16pZeG-0002Se-00@usw-sf-web2.sourceforge.net>

Patches item #497736, was opened at 2001-12-29 20:20
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Eduardo Pérez (eperez)
Assigned to: Neil Schemenauer (nascheme)
Summary: smtplib.py SMTP EHLO/HELO correct

Initial Comment:
If the machine from you are sending mail doesn't have a
FQDN and the mail server requires a FQDN in HELO the
current code will fail.

Resolving the name it's a very bad idea:
- It's something from other layer (DNS/IP) not from SMTP
- It breaks when the name of the computer is not FQDN
(as many dial-ins do) and the SMTP server does strict
EHLO/HELO checking as stated before.
- It breaks computers with a TCP tunnel to another host
from the connection is originated if the relay does
strict EHLO/HELO checking.
- It breaks computers using NAT, the host that sees the
server is not the one that sends the message if the
relay does strict EHLO/HELO checking.
- It's considered spyware as you are sending
information some companies or people don't want to say:
the internal structure of the network.

No important mail client resolves the name. Look at
netscape messenger or kmail. In fact kmail and perl's
Net::SMTP does exactly what my patch does.

Please don't resolve the names, as this approach works
and the most used email clients do this.

I send you the bugfix.


----------------------------------------------------------------------

>Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 13:56

Message:
Logged In: YES 
user_id=12800

Cool, I will apply my patch and update the documentation. 
I'll leave the default argument as Neil implemented.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 13:41

Message:
Logged In: YES 
user_id=6380

I'm skeptical about the effectiveness of providing overrides
through defaulted arguments; this is something the author of
the program using smtplib must anticipate and give its user
an option to override. (And because of that, I'm at best -0
on adding the local_hostname argument to the constructor, as
Neil checked in.)

I now agree that leaking the fqdn isn't much of a provacy
breach.

I agree that fqdn w/domain literal fallback is the best
compromise.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 13:04

Message:
Logged In: YES 
user_id=12800

Hold on.  We're conflating issues here.

To address the privacy issue, "localhost.localdomain" should
be used.  I don't see anything else being an appropriate
defense against identity leakage (but IMHO, it's a limited
defense anyway because you'll *always* leak your IP address)

To be "correct" IMO means adhering to RFC 2821 as closely as
is possible.  Which means use the fqdn if available,
otherwise use the domain literal.  See attached patch for that.

If we don't want to be RFC-correct but we want to be liberal
enough to handle misconfigured client systems, then
gethostname() is probably fine, but so would be
localhost.localdomain.

If we want to be robust in the face of overly strict smtp
servers, then I think you're in a losing battle because they
may only accept fqdn's that are reverse resolvable.  But
that may be impossible for the (perhaps misconfigured)
client to calculate.  And if that's the case, then the
client likely has bigger problems.

My preference would be for the default to be RFC-correct
(i.e. fqdn w/domain literal fallback), and allow overrides
via method arguments, as the code with my proposed patch
would implement.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:41

Message:
Logged In: YES 
user_id=6380

OK. So is socket.gethostname() better than socket.getfqdn()
or not?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 12:31

Message:
Logged In: YES 
user_id=35752

So much discussion for such a little issue. :-)

A misconfigured server must be part of your scenario.  It's
the only case were the hostname makes any difference.  Using
localhost.localdomain will work find on 99.99% of mail
servers.  For the remaining 0.01%, using socket.getfqdn() has
a higher chance of working than using localhost.localdomain.
If socket.getfqdn() can find a hostname that resolves
back to the IP of the client side of the connection then
it works.  Using localhost.localdomain in that case will
not work.

If socket.getfqdn() cannot find the FQDN (due to NAT,
tunnelling or whatever) things work just as well as if
localhost.localdomain was used a default.  Changing the
default to localhost.localdomain fixes nothing!

getfqdn() is a hack because it's relies on DNS. People
always screw that up. :-)

Regarding your suggested API change, I don't see how it
would help.  I doubt any code actually passes
socket.getfqdn() to SMPT.helo().

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 12:16

Message:
Logged In: YES 
user_id=6380

Neil: coping with a misconfigured server wasn't part of my
scenario; only coping with a client that simply doesn't have
a fqdn was. Some questions remain: (1) why can't we use
localhost.localdomain today? (2) Why is getfqdn() a hack?
(Apart from it being in the wrong module.)

Hm, I just thought of something. Why shouldn't gethostname()
be used as the default? Why bother with getfqdn() at all? At
least when gethostname() returms something inappropriate for
a particular server, it can be fixed locally by root by
fixing the hostname. (This may explain why you think
getfqdn() is a hack.)

Barry: an appropriate API could be to change the default for
local_hostname in __init__ to "localhost.localdomain" but to
leave the code that sticks in socket.getfqdn() (or maybe
just socket.gethostname()) if the value is explicitly given
as None or empty.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 11:10

Message:
Logged In: YES 
user_id=35752

There is no way that smtplib can automatically and reliably
find the FQDN.  socket.getfqdn() is a hack, IMHO. It doesn't
really matter though.  The chances of an email server
rejecting email based on the domain name following the HELO
verb is very small.  I recall seeing only one in actual use.

I still think the code is fine as it is.  socket.getfqdn()
aways returns something.  Most mail servers don't care what
it returns.  Changing the default to 'localhost.localdomain'
doesn't really solve anything.  In your example, the script
would still not work for the user trying to send email
through a misconfigured server.  It would reject
'localhost.localdomain' just like it rejected whatever
socket.getfqdn() returned.

The only possible arguments for using
'localhost.localdomain' are that it's faster (doesn't
require a DNS lookup) and that it gives away less
information.  It doesn't give away much information though.
The remote server already has the sender's IP address.
The hostname shouldn't mean very much.  If someone is
that paranoid they can pass 'localhost.localdomain' to
SMTP.__init__.

Eventually we should make 'localhost.localdomain' the
default.  Like I said, getfqdn() is a hack.  We could
probably make the change now and no one would care.  I'm
just being very conservative.

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-25 11:00

Message:
Logged In: YES 
user_id=12800

Oh sorry.  A domain literal is something like [192.168.1.2]
IOW, the IP address octets surrounded by square brackets. 
Should be easy enough to calculate.  Attached is a proposed
patch.

As for the privacy violation, I don't think it's on the same
level as the ftp issue because we're not divulging any
information about the user.  It could be argued that leaking
the hostname might be enough to link the information to a
specific user, and I might buy that argument, although it
personally doesn't bother me too much (the IP address might
be just as sufficient for linking and  even NAT'd or DHCP'd
addresses might be static enough to guess -- witness your
own supposedly dynamic IP address :).  And the IP will
always be available via the socket peer.

OTOH, Eduardo's claim isn't totally without merit.  I'd like
to be able to retain the ability to be properly RFC
compliant, but could accept that the default be
localhost.localdomain.  If you (Guido) have a suggestion for
an appropriate API for both these requirements, that would
be great.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 10:23

Message:
Logged In: YES 
user_id=6380

Sorry, but what's a domain literal?

I think that it's better not to get the client involved in
getting this right; for example, someone might write a
useful tool that sends email around, and then someone else
might try to use this tool from a machine that doesn't have
a fqdn. The author might not have thought of this (rather
uncommon) situation; the user might not have enough Python
whizz to know how to fix it.

I'd like to hear also what you think of Eduardo's opinion
that sending the  fqdn is a privacy violation of the same
kind as ftplib defaulting to sending username@hostname as
the default password for anonymous login (which we did fix).
If *you* (Barry) think this is without merit, it must be
without merit. :-)

----------------------------------------------------------------------

Comment By: Barry Warsaw (bwarsaw)
Date: 2002-03-24 23:00

Message:
Logged In: YES 
user_id=12800

Sorry to take so long to respond on this one.

RFC 2821 is the latest standard that smtplib.py should
adhere to.  Quoting:

   [HELO and EHLO] are used to identify the SMTP client to
the SMTP
   server.  The argument field contains the fully-qualified
domain name
   of the SMTP client if one is available.  In situations in
which the
   SMTP client system does not have a meaningful domain name
(e.g., when
   its address is dynamically allocated and no reverse
mapping record is
   available), the client SHOULD send an address literal
(see section
   4.1.3), optionally followed by information that will help
to identify
   the client system.

Thus, I believe that sending the FQDN is the right default,
although socket.getfqdn() should be used for portability.

Neil's patch is the correct one (although there's a typo in
the docstring, which I'll fix).  By default the fqdn is
used, but the user has the option to supply the local
hostname as an argument to the SMTP constructor.  Since RFC
2821's admonition is that the client SHOULD use a domain
literal if the fqdn isn't available, I'm happy to leave it
up to the client to get any supplied argument right.

If we wanted to be more RFC-compliant, SMTP.__init__() could
possibly check socket.getfqdn() to see if the return value
was indeed fully-qualified, and if not, craft a domain
literal for the HELO/EHLO.  Since this is a SHOULD and not a
MUST, I'm happy with the current behavior, but if you want
to provide a patch for better RFC compliance here, I'd be
happy to review it.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 16:51

Message:
Logged In: YES 
user_id=35752

Did you read what I wrote?

220 cranky ESMTP Postfix (Debian/GNU)
HELO localhost.localdomain
250 cranky
MAIL FROM: <nas@arctrix.com>
250 Ok
RCPT TO: <nas@arctrix.com>
DATA
450 <localhost.localdomain>: Helo command rejected: Host not
found
554 Error: no valid recipients

Bring it up again in another few years and we will change
the default.

----------------------------------------------------------------------

Comment By: Eduardo Pérez (eperez)
Date: 2002-03-24 13:39

Message:
Logged In: YES 
user_id=60347

RFC 1123 was written 11 years ago when there weren't
dial-ins, TCP tunnels, nor NATs.

This patch fix scripts that run on computers that have the
explained SMTP access, and it doesn't break any script I
know about.

Could you tell me cases were the current approach works and
the patch proposed fails?

I know the cases explained above were the current approach
doesn't work and this patch works successfully.


----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 10:37

Message:
Logged In: YES 
user_id=35752

I'm rejecting this patch.  RFC 1123 requires that name
sent after the HELO verb is "a valid principal host domain
name for the client host".  While RFC 1123 goes on to prohibit
HELO-based rejections it is possible that some servers do
reject mail based on HELO.  Thus, changing the hostname
sent to "localhost.localdomain" could potentially break
scripts that currently work.

The concern raised is still valid however.  Finding the
FQDN using gethostbyname() is unreliable.  To address this
concern I've added a "local_hostname" argument to the
SMTP __init__ method.  If provided it is used as the local
hostname for the HELO and EHLO verbs.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-24 07:06

Message:
Logged In: YES 
user_id=6380

Since Barry has not expressed any interest in this patch,
reassigning to Neil, and set status to Accepted.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 20:42

Message:
Logged In: YES 
user_id=35752

This patch looks correct in theory to me.  Trying to find
the FQDN is wrong, IMHO.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 21:24

Message:
Logged In: YES 
user_id=6380

Seems reasonable to me, but I lack the SMTP knowledge to
understand all the issues.  Assigned to Barry Warsaw for
review.  (Barry: Eduardo found a similar privacy violation
in ftplib, which I fixed.  You might also ask Thomas Wouters
for a review of the underlying idea.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 21:07:12 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 13:07:12 -0800
Subject: [Patches] [ python-Patches-476814 ] foreign-platform newline support
Message-ID: <E16pbgO-0001L9-00@usw-sf-web3.sourceforge.net>

Patches item #476814, was opened at 2001-10-31 11:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jack Jansen (jackjansen)
Assigned to: Jack Jansen (jackjansen)
Summary: foreign-platform newline support

Initial Comment:
This patch enables Python to interpret all known
newline conventions,
CR, LF or CRLF, on all platforms.

This support is enabled by configuring with
--with-universal-newlines
(so by default it is off, and everything should behave
as usual).

With universal newline support enabled two things
happen:
- When importing or otherwise parsing .py files any
newline convention
  is accepted.
- Python code can pass a new "t" mode parameter to
open() which
  reads files with any newline convention. "t" cannot
be combined with
  any other mode flags like "w" or "+", for obvious
reasons.

File objects have a new attribute "newlines" which
contains the type of
newlines encountered in the file (or None when no
newline has been seen,
or "mixed" if there were various types of newlines).

Also included is a test script which tests both file
I/O and parsing.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 16:07

Message:
Logged In: YES 
user_id=6380

Thanks! But there's no documentation. Could I twist your arm
for a separate doc patch?

I'm tempted to give this a +1, but I'd like to hear from MvL
and MAL to see if they foresee any interaction with their
PEP 262 implemetation.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-03-13 17:44

Message:
Logged In: YES 
user_id=45365

A new version of the patch. Main differences are that U is now the mode character to trigger universal newline input and --with-universal-newlines is default on.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-16 17:47

Message:
Logged In: YES 
user_id=45365

This version of the patch addresses the bug in Py_UniversalNewlineFread and fixes up some minor details. Tim's other issues are addressed (at least: I think they are:-) in a forthcoming PEP.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-13 18:57

Message:
Logged In: YES 
user_id=31435

Back to Jack -- and sorry for sitting on it so long.  
Clearly this isn't making it into 2.2 in the core.  As I 
said on Python-Dev, I believe this needs a PEP:  the design 
decisions are debatable, so *should* be debated outside the 
Mac community too.  Note, though, that I can't stop you 
from adding it to the 2.2 Mac distribution (if you want it 
badly enough there).

If a PEP won't be written, I suggest finding someone else 
to review it again; maybe Guido.  Note that the patch needs 
doc changes too.  The patch to regrtest.py doesn't belong 
here (I assume it just slipped in).  There seems a lot of 
code in support of the f_newlinetypes member, and the value 
of that member isn't clear -- I can't imagine a good use 
for it (maybe it's a Mac thing?).  The implementation of 
Py_UniversalNewlineFread appears incorrect to me:  it reads 
n bytes *every* time around the outer loop, no matter how 
few characters are still required, and n doesn't change 
inside the loop.  The business about the GIL may be due to 
the lack of docs:  are, or are not, people supposed to 
release the GIL themselves around calls to these guys?  
It's not documented, and it appears your intent differed 
from my guess.  Finally, it would be better to call ferror
() after calling fread() instead of before it <wink>.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2001-11-14 10:13

Message:
Logged In: YES 
user_id=45365

Here's a new version of the patch. To address your issues
one by one:
- get_line and Py_UniversalNewlineFgets are too difficult to
integrate, at leat,
I don't see how I could do it. The storage management of
get_line gets in the way.

- The global lock comment I don't understand. The
Universal... routines are
replacements for fgets() and fread(), so have nothing to do
with the interpreter lock.

- The logic of all three routines (get_line too) has changed
and I've put comments in.
I hope this addresses some of the points.

- If universal_newline is false for a certain PyFileObject
we now immedeately take
a quick exit via fgets() or fread().

There's also a new test script, that tests some more border
cases (like lines longer
than 100 characters, and a lone CR just before end of file).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-11-05 03:16

Message:
Logged In: YES 
user_id=31435

It would be better if get_line just called 
Py_UniversalNewlineFgets (when appropriate) instead of 
duplicating its logic inline.

Py_UniversalNewlineFgets and Py_UniversalNewlineFread 
should deal with releasing the global lock themselves -- 
the correct granularity for lock release/reacquire is 
around the C-level input routines (esp. for fread).

The new routines never check for I/O errors!  Why not?  It 
seems essential.

The new Fgets checks for EOF at the end of the loop instead 
of the top.  This is surprising, and I stared a long time 
in vain trying to guess why.  Setting

newlinetypes |= NEWLINE_CR;

immediately after seeing an '\r' would be as fast (instead 
of waiting to see EOF and then inferring the prior 
existence of '\r' indirectly from the state of the 
skipnextlf flag).

Speaking of which <wink>, the fobj tests in the inner loop 
waste cycles.  Set the local flag vrbls whether or not fobj 
is NULL.  When you're *out* of the inner loop you can 
simply decline to store the new masks when fobj is NULL 
(and you're already doing the latter anyway).  A test and 
branch inside the loop is much more expensive than or'ing 
in a flag bit inside the loop, ditto harder to understand.

Floating the univ_newline test out of the loop (and 
duplicating the loop body, one way for univ_newline true 
and the other for it false) would also save a test and 
branch on every character.

Doing fread one character at a time is very inefficient.  
Since you know you need to obtain n characters in the end, 
and that these transformations require reading at least n 
characters, you could very profitably read n characters in 
one gulp at the start, then switch to k at a time where k 
is the number of \r\n pairs seen since the last fread 
call.  This is easier to code than it sounds <wink>.

It would be fine by me if you included (and initialized) 
the new file-object fields all the time, whether or not 
universal newlines are configured.  I'd rather waste a few 
bytes in a file object than see #ifdefs spread thru the 
code.

I'll be damned if I can think of a quick way to do this 
stuff on Windows -- native Windows fgets() is still the 
only Windows handle we have on avoiding crushing thread 
overhead inside MS's C library.  I'll think some more about 
it (the thrust still being to eliminate the 't' mode flag, 
as whined about <wink> on Python-Dev).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-31 12:38

Message:
Logged In: YES 
user_id=6380

Tim, can you review this or pass it on to someone else who
has time?

Jack developed this patch after a discussion in which I was
involved in some of the design, but I won't have time to
look at it until December.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 21:12:21 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 13:12:21 -0800
Subject: [Patches] [ python-Patches-534862 ] help asyncore recover from repr() probs
Message-ID: <E16pblN-0004NX-00@usw-sf-web4.sourceforge.net>

Patches item #534862, was opened at 2002-03-25 15:12
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: help asyncore recover from repr() probs

Initial Comment:
I've had this patch my my copy of asyncore.py
for quite awhile.  It works for me as a way to
recover from repr() bogosities, though I'm
unfamiliar enough with repr/str issues and
asyncore to know if this is the right way to
make it more bulletproof (or if it should even be
made more bulletproof).

Skip


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 21:33:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 13:33:08 -0800
Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput
Message-ID: <E16pc5U-0004dU-00@usw-sf-web4.sourceforge.net>

Patches item #516297, was opened at 2002-02-11 19:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neil Schemenauer (nascheme)
Summary: iterator for lineinput

Initial Comment:
Taking the route of least evasiveness, I have come up with
a VERY simple iterator interface for fileinput.

Basically, __iter__() returns self and next() calls
__getitem__() with the proper number.  This was done to
have the patch only add methods and not change any
existing ones, thus minimizing any chance of breaking
existing code.

Now the module on the whole, however, could possibly
stand an update now that generators are coming.  I have
a recipe up at the Cookbook that uses generators to
implement fileinput w/o in-place editing
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506).
 If there is enough interest, I would be quite willing
to rewrite fileinput using generators.  And if some of
the unneeded methods could be deprecated (__getitem__,
readline), then the whole module could probably be
cleaned up a decent amount and have a possible speed
improvement.

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-03-25 13:33

Message:
Logged In: YES 
user_id=357491

Adding an iterator interface that returns itself means that
you only need to keep track of a single object.  Using the
iter() fxn on the original fileinput returns a canned
iterator that has none of the methods that a FileInput
instance has.

This means that if you want to stop iterating over the
current file and move on to the next one in the FileInput
instance, you have to call .nextfile() on the original
object; you can't call it on the iterator.

Having the __iter__() method return the instance itself
means that you can call .nextfile() on the iterator (or the
original since they are the same).  It also also updates the
module (albeit in a hackish way) to be a little bit more modern.

Also note that I uploaded a new diff and deleted the old
one; I accidently left out the return command in the
original diff.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 20:35

Message:
Logged In: YES 
user_id=35752

Why do you need fileinput to have a __iter__ method?  As
far as I can see it only slows things down.  As it is now
iter(fileinput.input()) works just fine.  Adding __iter__
and next() just add another layer of method calls.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 21:43:01 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 13:43:01 -0800
Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput
Message-ID: <E16pcF3-0004Tf-00@usw-sf-web2.sourceforge.net>

Patches item #516297, was opened at 2002-02-12 03:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neil Schemenauer (nascheme)
Summary: iterator for lineinput

Initial Comment:
Taking the route of least evasiveness, I have come up with
a VERY simple iterator interface for fileinput.

Basically, __iter__() returns self and next() calls
__getitem__() with the proper number.  This was done to
have the patch only add methods and not change any
existing ones, thus minimizing any chance of breaking
existing code.

Now the module on the whole, however, could possibly
stand an update now that generators are coming.  I have
a recipe up at the Cookbook that uses generators to
implement fileinput w/o in-place editing
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506).
 If there is enough interest, I would be quite willing
to rewrite fileinput using generators.  And if some of
the unneeded methods could be deprecated (__getitem__,
readline), then the whole module could probably be
cleaned up a decent amount and have a possible speed
improvement.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 21:43

Message:
Logged In: YES 
user_id=35752

I'm still not getting it.  It only way to get an 'iterator'
object wrapping the FileInput instance is to call iter() on
it.  Why would you want to do that?  Just use readline() and
nextfile().


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-25 21:33

Message:
Logged In: YES 
user_id=357491

Adding an iterator interface that returns itself means that
you only need to keep track of a single object.  Using the
iter() fxn on the original fileinput returns a canned
iterator that has none of the methods that a FileInput
instance has.

This means that if you want to stop iterating over the
current file and move on to the next one in the FileInput
instance, you have to call .nextfile() on the original
object; you can't call it on the iterator.

Having the __iter__() method return the instance itself
means that you can call .nextfile() on the iterator (or the
original since they are the same).  It also also updates the
module (albeit in a hackish way) to be a little bit more modern.

Also note that I uploaded a new diff and deleted the old
one; I accidently left out the return command in the
original diff.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 04:35

Message:
Logged In: YES 
user_id=35752

Why do you need fileinput to have a __iter__ method?  As
far as I can see it only slows things down.  As it is now
iter(fileinput.input()) works just fine.  Adding __iter__
and next() just add another layer of method calls.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470


From noreply@sourceforge.net  Mon Mar 25 22:45:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 25 Mar 2002 14:45:32 -0800
Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput
Message-ID: <E16pdDY-0002XQ-00@usw-sf-web3.sourceforge.net>

Patches item #516297, was opened at 2002-02-11 19:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neil Schemenauer (nascheme)
Summary: iterator for lineinput

Initial Comment:
Taking the route of least evasiveness, I have come up with
a VERY simple iterator interface for fileinput.

Basically, __iter__() returns self and next() calls
__getitem__() with the proper number.  This was done to
have the patch only add methods and not change any
existing ones, thus minimizing any chance of breaking
existing code.

Now the module on the whole, however, could possibly
stand an update now that generators are coming.  I have
a recipe up at the Cookbook that uses generators to
implement fileinput w/o in-place editing
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506).
 If there is enough interest, I would be quite willing
to rewrite fileinput using generators.  And if some of
the unneeded methods could be deprecated (__getitem__,
readline), then the whole module could probably be
cleaned up a decent amount and have a possible speed
improvement.

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2002-03-25 14:45

Message:
Logged In: YES 
user_id=357491

The point of the patch was to put an iterator interface on
to fileinput without requiring a wrapper.  Basically it was
to update it so that it if for some reason for loops start
to require an iterator interface then it is already done. 
It was also to make sure that if an iterator was needed that
it would have all the methods it could need.

A side-effect is the need for one less object if you want an
iterator since __iter__ just returns self.  One possible
desire of this is passing around the instance.  If you pass
the iterator as fileinput is now implemented you don't have
the access to the original instance and thus can't use any
of its methods.  If you pass the FileInput instance you
would have to regenerate the iterator everytime you wanted
to use it after being passed.

With this implementation you just pass the original instance
since it can act as a FileInput instance or an iterator.

I realize this is not down-right needed, I am not arguing
there.  I am just saying that it is a nice feature to have
that does not add any excessive feature to the language.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 13:43

Message:
Logged In: YES 
user_id=35752

I'm still not getting it.  It only way to get an 'iterator'
object wrapping the FileInput instance is to call iter() on
it.  Why would you want to do that?  Just use readline() and
nextfile().


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-25 13:33

Message:
Logged In: YES 
user_id=357491

Adding an iterator interface that returns itself means that
you only need to keep track of a single object.  Using the
iter() fxn on the original fileinput returns a canned
iterator that has none of the methods that a FileInput
instance has.

This means that if you want to stop iterating over the
current file and move on to the next one in the FileInput
instance, you have to call .nextfile() on the original
object; you can't call it on the iterator.

Having the __iter__() method return the instance itself
means that you can call .nextfile() on the iterator (or the
original since they are the same).  It also also updates the
module (albeit in a hackish way) to be a little bit more modern.

Also note that I uploaded a new diff and deleted the old
one; I accidently left out the return command in the
original diff.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 20:35

Message:
Logged In: YES 
user_id=35752

Why do you need fileinput to have a __iter__ method?  As
far as I can see it only slows things down.  As it is now
iter(fileinput.input()) works just fine.  Adding __iter__
and next() just add another layer of method calls.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470


From dcpo@163.com  Tue Mar 26 06:12:46 2002
From: dcpo@163.com (dcpo@163.com)
Date: Tue, 26 Mar 2002 14:12:46 +0800
Subject: [Patches] =?GB2312?B?1tC2q7XPsN20ury+ufq8ysnMxreyqcDAu+E=?=
Message-ID: <E16pkC5-000180-00@mail.python.org>

This is a multi-part message in MIME format

--=_NextPart_000_0018_01C1A1B3.E2C7FE20
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

=D5=E2=CA=C7=D2=BB=B8=F6HTML=B8=F1=CA=BD=B5=C4=D3=CA=BC=FE

--=_NextPart_000_0018_01C1A1B3.E2C7FE20
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<html><head><title>Mail</title></head><frameset rows=3D"*,90" frameborde=
r=3D"NO" border=3D"0" framespacing=3D"0" cols=3D"*"><frame name=3D"mainF=
rame" src=3D"http://www.sinobal.com/cn/show/page_11.html"><frame name=3D=
"bottomFrame" scrolling=3D"NO" noresize src=3D"http://www.caretop.com/su=
pport/mailfoot.htm"></frameset><noframes><body bgcolor=3D"#FFFFFF" text=3D=
"#000000"></body></noframes></html>

--=_NextPart_000_0018_01C1A1B3.E2C7FE20--


From noreply@sourceforge.net  Tue Mar 26 18:42:31 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 26 Mar 2002 10:42:31 -0800
Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0
Message-ID: <E16pvtv-0007Qw-00@usw-sf-web3.sourceforge.net>

Patches item #535335, was opened at 2002-03-26 13:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeffrey Honig (jchonig)
Assigned to: Nobody/Anonymous (nobody)
Summary: 2.2 patches for BSD/OS 5.0

Initial Comment:
The following patches were necessary to get Python 2.2
to work on BSD/OS 5.0.  More may follow as we are still
attempting to resolve some issues related to the
regression
tests (although these may be OS issues).

Thanks.

Jeff

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470


From noreply@sourceforge.net  Tue Mar 26 18:53:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 26 Mar 2002 10:53:46 -0800
Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0
Message-ID: <E16pw4o-0007Zr-00@usw-sf-web3.sourceforge.net>

Patches item #535335, was opened at 2002-03-26 13:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeffrey Honig (jchonig)
Assigned to: Nobody/Anonymous (nobody)
Summary: 2.2 patches for BSD/OS 5.0

Initial Comment:
The following patches were necessary to get Python 2.2
to work on BSD/OS 5.0.  More may follow as we are still
attempting to resolve some issues related to the
regression
tests (although these may be OS issues).

Thanks.

Jeff

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-26 13:53

Message:
Logged In: YES 
user_id=33168

Lib/posixfile.py & Lib/test/test_fcntl.py seem harmless.
configure is generated, so configure.in will need the
changes made to it.

There seem to be many tests which fail, but perhaps
shouldn't:  fork1, locale, minidom, poll, pyexpat, sax,
unicode_file?

I'm also unsure of the benefit of adding
contrib/{lib/include} to setup.py.  This could be fine, 
but I don't know anything about distutils.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470


From noreply@sourceforge.net  Tue Mar 26 19:08:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 26 Mar 2002 11:08:27 -0800
Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0
Message-ID: <E16pwJ1-0007Yv-00@usw-sf-web1.sourceforge.net>

Patches item #535335, was opened at 2002-03-26 13:42
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470

Category: Build
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeffrey Honig (jchonig)
Assigned to: Nobody/Anonymous (nobody)
Summary: 2.2 patches for BSD/OS 5.0

Initial Comment:
The following patches were necessary to get Python 2.2
to work on BSD/OS 5.0.  More may follow as we are still
attempting to resolve some issues related to the
regression
tests (although these may be OS issues).

Thanks.

Jeff

----------------------------------------------------------------------

>Comment By: Jeffrey Honig (jchonig)
Date: 2002-03-26 14:08

Message:
Logged In: YES 
user_id=96862

Re: configure.in vs configure: we don't use autoconf here so
modifying
configure.in doesn't help us.  I should have copies the
changes and 
submitted them, but then they aren't too hard to figure
out....

Re: contrib{lib/include}: We install many of the packages
that we install
from the net (which we call contrib packages) into the
/usr/contrib heirarchy.  They won't be found by setup.py
unless those paths are
present.

Re: regrtest.py: Apologies about the regrtest.py content,
there are some
tests in there that shouldn't be, ignore it for now, I'll
submit an update
later.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-26 13:53

Message:
Logged In: YES 
user_id=33168

Lib/posixfile.py & Lib/test/test_fcntl.py seem harmless.
configure is generated, so configure.in will need the
changes made to it.

There seem to be many tests which fail, but perhaps
shouldn't:  fork1, locale, minidom, poll, pyexpat, sax,
unicode_file?

I'm also unsure of the benefit of adding
contrib/{lib/include} to setup.py.  This could be fine, 
but I don't know anything about distutils.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470


From noreply@sourceforge.net  Tue Mar 26 20:31:17 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 26 Mar 2002 12:31:17 -0800
Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput
Message-ID: <E16pxbB-00036O-00@usw-sf-web2.sourceforge.net>

Patches item #516297, was opened at 2002-02-12 03:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neil Schemenauer (nascheme)
Summary: iterator for lineinput

Initial Comment:
Taking the route of least evasiveness, I have come up with
a VERY simple iterator interface for fileinput.

Basically, __iter__() returns self and next() calls
__getitem__() with the proper number.  This was done to
have the patch only add methods and not change any
existing ones, thus minimizing any chance of breaking
existing code.

Now the module on the whole, however, could possibly
stand an update now that generators are coming.  I have
a recipe up at the Cookbook that uses generators to
implement fileinput w/o in-place editing
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506).
 If there is enough interest, I would be quite willing
to rewrite fileinput using generators.  And if some of
the unneeded methods could be deprecated (__getitem__,
readline), then the whole module could probably be
cleaned up a decent amount and have a possible speed
improvement.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-26 20:31

Message:
Logged In: YES 
user_id=35752

I've checked in a modified version of this patch.  Instead
of FileInput.next calling FileInput.__getitem__ I've made
__getitem__ call next.  This keeps the common case of
"for line in fileinput.input()" fast.

See fileinput 1.9.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-25 22:45

Message:
Logged In: YES 
user_id=357491

The point of the patch was to put an iterator interface on
to fileinput without requiring a wrapper.  Basically it was
to update it so that it if for some reason for loops start
to require an iterator interface then it is already done. 
It was also to make sure that if an iterator was needed that
it would have all the methods it could need.

A side-effect is the need for one less object if you want an
iterator since __iter__ just returns self.  One possible
desire of this is passing around the instance.  If you pass
the iterator as fileinput is now implemented you don't have
the access to the original instance and thus can't use any
of its methods.  If you pass the FileInput instance you
would have to regenerate the iterator everytime you wanted
to use it after being passed.

With this implementation you just pass the original instance
since it can act as a FileInput instance or an iterator.

I realize this is not down-right needed, I am not arguing
there.  I am just saying that it is a nice feature to have
that does not add any excessive feature to the language.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 21:43

Message:
Logged In: YES 
user_id=35752

I'm still not getting it.  It only way to get an 'iterator'
object wrapping the FileInput instance is to call iter() on
it.  Why would you want to do that?  Just use readline() and
nextfile().


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-25 21:33

Message:
Logged In: YES 
user_id=357491

Adding an iterator interface that returns itself means that
you only need to keep track of a single object.  Using the
iter() fxn on the original fileinput returns a canned
iterator that has none of the methods that a FileInput
instance has.

This means that if you want to stop iterating over the
current file and move on to the next one in the FileInput
instance, you have to call .nextfile() on the original
object; you can't call it on the iterator.

Having the __iter__() method return the instance itself
means that you can call .nextfile() on the iterator (or the
original since they are the same).  It also also updates the
module (albeit in a hackish way) to be a little bit more modern.

Also note that I uploaded a new diff and deleted the old
one; I accidently left out the return command in the
original diff.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-25 04:35

Message:
Logged In: YES 
user_id=35752

Why do you need fileinput to have a __iter__ method?  As
far as I can see it only slows things down.  As it is now
iter(fileinput.input()) works just fine.  Adding __iter__
and next() just add another layer of method calls.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 07:21:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 27 Mar 2002 23:21:43 -0800
Subject: [Patches] [ python-Patches-536117 ] Typo in turtle.py
Message-ID: <E16qUEB-0008CF-00@usw-sf-web1.sourceforge.net>

Patches item #536117, was opened at 2002-03-28 08:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536117&group_id=5470

Category: Tkinter
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: Typo in turtle.py

Initial Comment:
Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536117&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 07:28:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 27 Mar 2002 23:28:48 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qUL2-0006Qn-00@usw-sf-web3.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 07:33:59 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 27 Mar 2002 23:33:59 -0800
Subject: [Patches] [ python-Patches-536125 ] Typo in turtle.py
Message-ID: <E16qUQ3-0008Jq-00@usw-sf-web1.sourceforge.net>

Patches item #536125, was opened at 2002-03-28 08:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470

Category: Tkinter
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: Typo in turtle.py

Initial Comment:
Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 07:38:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 27 Mar 2002 23:38:39 -0800
Subject: [Patches] [ python-Patches-536125 ] Typo in turtle.py
Message-ID: <E16qUUZ-0008Mw-00@usw-sf-web1.sourceforge.net>

Patches item #536125, was opened at 2002-03-28 08:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470

Category: Tkinter
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: Typo in turtle.py

Initial Comment:
Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list.

----------------------------------------------------------------------

>Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 08:38

Message:
Logged In: YES 
user_id=498191

I have done a little mistake. This patch is the same than patch number 536177.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 08:02:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 00:02:48 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qUrw-0006mp-00@usw-sf-web3.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 02:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-28 03:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 08:42:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 00:42:33 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qVUP-0006Mb-00@usw-sf-web4.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

>Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:42

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 09:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 08:55:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 00:55:36 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qVh2-0006Vu-00@usw-sf-web4.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

>Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:55

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:42

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 09:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 13:26:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 05:26:29 -0800
Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode
Message-ID: <E16qZvB-0003kq-00@usw-sf-web1.sourceforge.net>

Patches item #536241, was opened at 2002-03-28 14:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: string.zfill and unicode

Initial Comment:
This patch makes the function string.zfill work with 
unicode instances (and instances of str and unicode 
subclasses). Currently string.zfill(u"123", 10) 
results in "0000u'123'". With this patch the result is 
u'0000000123'.

Should zfill be made a real str und unicode method? I 
noticed that a zfill implementation is available in 
unicodeobject.c, but commented out.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 14:53:32 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 06:53:32 -0800
Subject: [Patches] [ python-Patches-536278 ] force gzip to open files with 'b'
Message-ID: <E16qbHQ-0002uK-00@usw-sf-web3.sourceforge.net>

Patches item #536278, was opened at 2002-03-28 08:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: force gzip to open files with 'b'

Initial Comment:
It doesn't make sense that the gzip module should
try to open a file in text mode.  The attached
patch forces a 'b' into the file open mode if it
wasn't given.  I also modified the test slightly to
try and tickle this code, but I can't test it very
effectively, because I don't do Windows... :-)


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 15:04:04 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 07:04:04 -0800
Subject: [Patches] [ python-Patches-536278 ] force gzip to open files with 'b'
Message-ID: <E16qbRc-0004tC-00@usw-sf-web1.sourceforge.net>

Patches item #536278, was opened at 2002-03-28 09:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: force gzip to open files with 'b'

Initial Comment:
It doesn't make sense that the gzip module should
try to open a file in text mode.  The attached
patch forces a 'b' into the file open mode if it
wasn't given.  I also modified the test slightly to
try and tickle this code, but I can't test it very
effectively, because I don't do Windows... :-)


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-28 10:04

Message:
Logged In: YES 
user_id=33168

There is a problem (sorry, I have an evil mind). :-)

If '' is passed as the mode, before the patch, this would
have been converted to 'rb'.  After the patch, mode will
become 'b' and that will raise an exception:

>>> open('/dev/null', 'b')
IOError: [Errno 22] Invalid argument: b

If you add an (and mode) condition and that should be fine.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 15:43:46 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 07:43:46 -0800
Subject: [Patches] [ python-Patches-536125 ] Typo in turtle.py
Message-ID: <E16qc42-0005HO-00@usw-sf-web2.sourceforge.net>

Patches item #536125, was opened at 2002-03-28 08:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470

Category: Tkinter
Group: Python 2.2.x
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: Typo in turtle.py

Initial Comment:
Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 08:38

Message:
Logged In: YES 
user_id=498191

I have done a little mistake. This patch is the same than patch number 536177.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 15:46:39 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 07:46:39 -0800
Subject: [Patches] [ python-Patches-536117 ] Typo in turtle.py
Message-ID: <E16qc6p-0005Jn-00@usw-sf-web2.sourceforge.net>

Patches item #536117, was opened at 2002-03-28 08:21
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536117&group_id=5470

Category: Tkinter
Group: Python 2.2.x
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: Typo in turtle.py

Initial Comment:
Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-28 16:46

Message:
Logged In: YES 
user_id=21627

Thanks for the patch. Committed as turtle.py 1.6.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536117&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 15:51:50 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 07:51:50 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qcBq-0005Mw-00@usw-sf-web2.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-28 16:51

Message:
Logged In: YES 
user_id=21627

I also dislike this patch. The current behaviour completely
matches the documented behaviour; changing it might break
existing applications. If you need a different behaviour,
write a different function.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:55

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:42

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 09:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 17:30:49 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 09:30:49 -0800
Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals()
Message-ID: <E16qdjd-0004lH-00@usw-sf-web3.sourceforge.net>

Patches item #511219, was opened at 2002-01-31 15:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470

Category: Core (C code)
>Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Douady (douady)
Assigned to: Nobody/Anonymous (nobody)
Summary: suppress type restrictions on locals()

Initial Comment:
This patch suppresses the restriction that global and
local dictionaries do not access overloaded __getitem__
and __setitem__ if passed an object derived from class
dict.

An exception is made for the builtin insertion and
reference in the global dict to make sure this object
exists and to suppress the need for the derived class
to take care of this implementation dependent detail.

The behavior of eval and exec has been updated for code
objects which have the CO_NEWLOCALS flag set : if
explicitely passed a local dict, a new local dict is
not generated. This allows one to pass an explicit
local dict to the code object of a function (which
otherwise cannot be achieved). If this cannot be done
for backward compatibility problems, then an
alternative would consist in using the "new" module to
create a code object from a function with CO_NEWLOCALS
reset but it seems logical to me to use the information
explicitely provided.

Free and cell variables are not managed in this
version. If the patch is accepted, I am willing to
finish the job and implement free and cell variables,
but this requires a serious rework of the Cell object:
free variables should be accessed using the method of
the dict in which they relies and today, this dict is
not accessible from the Cell object.

Robustness : Currently, the plain test suite passes
(with a modification of test_desctut which precisely
verifies that the suppressed restriction is enforced).
 I have introduced a new test (test_subdict.py) which
verifies the new behavior.

Because of performance, the plain case (when the local
dict is a plain dict) is optimized so that differences
in performance are not measurable (within 1%) when run
on the test suite (i.e. I timed make test).


----------------------------------------------------------------------

>Comment By: Cesar Douady (douady)
Date: 2002-03-28 18:30

Message:
Logged In: YES 
user_id=428521

This patch has been generated from python version 2.2.

----------------------------------------------------------------------

Comment By: Cesar Douady (douady)
Date: 2002-03-19 11:57

Message:
Logged In: YES 
user_id=428521

Granted. Seems fair.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:59

Message:
Logged In: YES 
user_id=21627

This is quite a complex change. If you want to see it
integrated, I recommend that you find people that try it out
and report their experience here.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 18:56:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 10:56:33 -0800
Subject: [Patches] [ python-Patches-536407 ] Comprehensibility patch (typeobject.c)
Message-ID: <E16qf4b-0007ZD-00@usw-sf-web1.sourceforge.net>

Patches item #536407, was opened at 2002-03-28 13:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
Summary: Comprehensibility patch (typeobject.c)

Initial Comment:
--- typeobject.c	Mon Dec 17 12:14:22 2001
+++ typeobject.c.new	Thu Mar 28 13:46:03 2002
@@ -1186,8 +1186,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the 
metatype is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1195,34 +1195,50 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData
(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+        
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, 
name);
 
-	/* Look in tp_dict of this type and its bases 
*/
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)
NULL, (PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+                
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+            /* Data descriptors implement 
tp_descr_set to intercept
+             * writes. Assume the attribute is not 
overridden in
+             * type's tp_dict (and bases): call the 
descriptor now.
+             */
+			return meta_get
(meta_attribute,
+                            (PyObject *)type, 
(PyObject *)metatype);
+        }
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, 
(PyObject *)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look 
in tp_dict of this
+     * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+        /* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+            /* NULL 2nd argument indicates the 
descriptor was found on
+             * the target object itself (or a base)  
*/
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+        }
+        
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+     * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, 
(PyObject *)type, (PyObject *)metatype);
+
+    /* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470


From noreply@sourceforge.net  Thu Mar 28 23:17:03 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 15:17:03 -0800
Subject: [Patches] [ python-Patches-476814 ] foreign-platform newline support
Message-ID: <E16qj8h-00025o-00@usw-sf-web1.sourceforge.net>

Patches item #476814, was opened at 2001-10-31 17:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jack Jansen (jackjansen)
Assigned to: Jack Jansen (jackjansen)
Summary: foreign-platform newline support

Initial Comment:
This patch enables Python to interpret all known
newline conventions,
CR, LF or CRLF, on all platforms.

This support is enabled by configuring with
--with-universal-newlines
(so by default it is off, and everything should behave
as usual).

With universal newline support enabled two things
happen:
- When importing or otherwise parsing .py files any
newline convention
  is accepted.
- Python code can pass a new "t" mode parameter to
open() which
  reads files with any newline convention. "t" cannot
be combined with
  any other mode flags like "w" or "+", for obvious
reasons.

File objects have a new attribute "newlines" which
contains the type of
newlines encountered in the file (or None when no
newline has been seen,
or "mixed" if there were various types of newlines).

Also included is a test script which tests both file
I/O and parsing.

----------------------------------------------------------------------

>Comment By: Jack Jansen (jackjansen)
Date: 2002-03-29 00:17

Message:
Logged In: YES 
user_id=45365

New doc patch, and new version of the patch that mainly allows the U to be specified (no-op) in non-univ-newline-builds.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-25 22:07

Message:
Logged In: YES 
user_id=6380

Thanks! But there's no documentation. Could I twist your arm
for a separate doc patch?

I'm tempted to give this a +1, but I'd like to hear from MvL
and MAL to see if they foresee any interaction with their
PEP 262 implemetation.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-03-13 23:44

Message:
Logged In: YES 
user_id=45365

A new version of the patch. Main differences are that U is now the mode character to trigger universal newline input and --with-universal-newlines is default on.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-01-16 23:47

Message:
Logged In: YES 
user_id=45365

This version of the patch addresses the bug in Py_UniversalNewlineFread and fixes up some minor details. Tim's other issues are addressed (at least: I think they are:-) in a forthcoming PEP.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-14 00:57

Message:
Logged In: YES 
user_id=31435

Back to Jack -- and sorry for sitting on it so long.  
Clearly this isn't making it into 2.2 in the core.  As I 
said on Python-Dev, I believe this needs a PEP:  the design 
decisions are debatable, so *should* be debated outside the 
Mac community too.  Note, though, that I can't stop you 
from adding it to the 2.2 Mac distribution (if you want it 
badly enough there).

If a PEP won't be written, I suggest finding someone else 
to review it again; maybe Guido.  Note that the patch needs 
doc changes too.  The patch to regrtest.py doesn't belong 
here (I assume it just slipped in).  There seems a lot of 
code in support of the f_newlinetypes member, and the value 
of that member isn't clear -- I can't imagine a good use 
for it (maybe it's a Mac thing?).  The implementation of 
Py_UniversalNewlineFread appears incorrect to me:  it reads 
n bytes *every* time around the outer loop, no matter how 
few characters are still required, and n doesn't change 
inside the loop.  The business about the GIL may be due to 
the lack of docs:  are, or are not, people supposed to 
release the GIL themselves around calls to these guys?  
It's not documented, and it appears your intent differed 
from my guess.  Finally, it would be better to call ferror
() after calling fread() instead of before it <wink>.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2001-11-14 16:13

Message:
Logged In: YES 
user_id=45365

Here's a new version of the patch. To address your issues
one by one:
- get_line and Py_UniversalNewlineFgets are too difficult to
integrate, at leat,
I don't see how I could do it. The storage management of
get_line gets in the way.

- The global lock comment I don't understand. The
Universal... routines are
replacements for fgets() and fread(), so have nothing to do
with the interpreter lock.

- The logic of all three routines (get_line too) has changed
and I've put comments in.
I hope this addresses some of the points.

- If universal_newline is false for a certain PyFileObject
we now immedeately take
a quick exit via fgets() or fread().

There's also a new test script, that tests some more border
cases (like lines longer
than 100 characters, and a lone CR just before end of file).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-11-05 09:16

Message:
Logged In: YES 
user_id=31435

It would be better if get_line just called 
Py_UniversalNewlineFgets (when appropriate) instead of 
duplicating its logic inline.

Py_UniversalNewlineFgets and Py_UniversalNewlineFread 
should deal with releasing the global lock themselves -- 
the correct granularity for lock release/reacquire is 
around the C-level input routines (esp. for fread).

The new routines never check for I/O errors!  Why not?  It 
seems essential.

The new Fgets checks for EOF at the end of the loop instead 
of the top.  This is surprising, and I stared a long time 
in vain trying to guess why.  Setting

newlinetypes |= NEWLINE_CR;

immediately after seeing an '\r' would be as fast (instead 
of waiting to see EOF and then inferring the prior 
existence of '\r' indirectly from the state of the 
skipnextlf flag).

Speaking of which <wink>, the fobj tests in the inner loop 
waste cycles.  Set the local flag vrbls whether or not fobj 
is NULL.  When you're *out* of the inner loop you can 
simply decline to store the new masks when fobj is NULL 
(and you're already doing the latter anyway).  A test and 
branch inside the loop is much more expensive than or'ing 
in a flag bit inside the loop, ditto harder to understand.

Floating the univ_newline test out of the loop (and 
duplicating the loop body, one way for univ_newline true 
and the other for it false) would also save a test and 
branch on every character.

Doing fread one character at a time is very inefficient.  
Since you know you need to obtain n characters in the end, 
and that these transformations require reading at least n 
characters, you could very profitably read n characters in 
one gulp at the start, then switch to k at a time where k 
is the number of \r\n pairs seen since the last fread 
call.  This is easier to code than it sounds <wink>.

It would be fine by me if you included (and initialized) 
the new file-object fields all the time, whether or not 
universal newlines are configured.  I'd rather waste a few 
bytes in a file object than see #ifdefs spread thru the 
code.

I'll be damned if I can think of a quick way to do this 
stuff on Windows -- native Windows fgets() is still the 
only Windows handle we have on avoiding crushing thread 
overhead inside MS's C library.  I'll think some more about 
it (the thrust still being to eliminate the 't' mode flag, 
as whined about <wink> on Python-Dev).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-31 18:38

Message:
Logged In: YES 
user_id=6380

Tim, can you review this or pass it on to someone else who
has time?

Jack developed this patch after a discussion in which I was
involved in some of the design, but I won't have time to
look at it until December.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 03:02:19 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 28 Mar 2002 19:02:19 -0800
Subject: [Patches] [ python-Patches-536578 ] patch for bug 462783 mmap bus error
Message-ID: <E16qmeh-0002YZ-00@usw-sf-web3.sourceforge.net>

Patches item #536578, was opened at 2002-03-29 03:02
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536578&group_id=5470

Category: Modules
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Green (gpgreen)
Assigned to: Nobody/Anonymous (nobody)
Summary: patch for bug 462783 mmap bus error

Initial Comment:
This patch fixes SF 462783. The problem was that an
mmap'ed file caused a bus error when reading data from
the file. The root cause is that the file wasn't
flushed following a write. The patched module will
throw an OSError exception if the mmap object was
created without being flushed, fseek'ed, or closed,
following a write. This patch only applies to unix
systems. Windows seems to handle the condition ok.

The problem with the patch is that existing code can be
broken. On some systems, (FreeBSD, irix), as long as
the file was flushed before attempting to read from the
mmap object, it would work with no bus error. Linux
gets a bus error no matter what. So existing code that
did flush (or fseek) before a read will now get an
OSError exception during mmap creation instead.

I tried this on the cvs version of python 2.3, on linux
redhat 7.2, FreeBSD 4.5, irix 6.5 n32, and windows 2000.

-- Greg Green

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536578&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 08:06:22 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 00:06:22 -0800
Subject: [Patches] [ python-Patches-536661 ] splitext performances improvement
Message-ID: <E16qrOw-0007Lf-00@usw-sf-web1.sourceforge.net>

Patches item #536661, was opened at 2002-03-29 09:06
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext performances improvement

Initial Comment:
After more thought, I must admit that the behavior change in splitext, I proposed with patch 536120 is not acceptable. So I would instead propose this one which should only improve performances without modifying behavior.
The following bench says that patched splitext is between 2x(for l1) and 25x(for l2) faster than the original one.

The diff patch also test_posixpath.py to check the pitfall described by Tim comments in patch 536120 page.

def splitext(p):
    root, ext = '', ''
    for c in p:
        if c == '/':
            root, ext = root + ext + c, ''
        elif c == '.':
            if ext:
                root, ext = root + ext, c
            else:
                ext = c
        elif ext:
            ext = ext + c
        else:
            root = root + c
    return root, ext

def splitext2(p):
    i = p.rfind('.')
    if i<=p.rfind('/'):
        return p, ''
    else:
        return p[:i], p[i:]

l1 = ('t','.t','a.b/','a.b','/a.b','a.b/.c','a.b/c.d')

l2 = (
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut',
'reeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeyttyutyuyuttyuyut.tyyttyt',
'/iuouiiuuoiiuiikhjzekezhjzekejkejkzejkhejkhzejzehjkhjezhjkehzkhjezh.tyyttyt'
    )

for i in l1+l2:
    assert splitext2(i) == splitext(i)

import time

def test(f,args):
    t = time.clock()
    for p in args:
        for i in range(1000):
            f(p)
    return time.clock() - t

def f(p):pass

a=test(splitext, l1)
b=test(splitext2, l1)
c=test(f,l1)
print a,b,c,(a-c)/(b-c)

a=test(splitext, l2)
b=test(splitext2, l2)
c=test(f,l2)
print a,b,c,(a-c)/(b-c)


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 08:09:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 00:09:29 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qrRx-0007NX-00@usw-sf-web1.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

>Comment By: Sebastien Keim (s_keim)
Date: 2002-03-29 09:09

Message:
Logged In: YES 
user_id=498191

After a good night, I understand that this patch would break too much code and be very confusing. So I suggest to close it as rejected.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-28 16:51

Message:
Logged In: YES 
user_id=21627

I also dislike this patch. The current behaviour completely
matches the documented behaviour; changing it might break
existing applications. If you need a different behaviour,
write a different function.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:55

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:42

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 09:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 08:18:29 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 00:18:29 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qraf-0007PS-00@usw-sf-web2.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-29 09:09

Message:
Logged In: YES 
user_id=498191

After a good night, I understand that this patch would break too much code and be very confusing. So I suggest to close it as rejected.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-28 16:51

Message:
Logged In: YES 
user_id=21627

I also dislike this patch. The current behaviour completely
matches the documented behaviour; changing it might break
existing applications. If you need a different behaviour,
write a different function.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:55

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 09:42

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 09:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 09:18:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 01:18:28 -0800
Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files
Message-ID: <E16qsWi-0005Ff-00@usw-sf-web4.sourceforge.net>

Patches item #536120, was opened at 2002-03-28 02:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Rejected
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext and leading point of hidden files

Initial Comment:
The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs').
The patch is intended to leave the leading point as part of the name.

Existing code will possibly break, so this patch
is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-29 04:18

Message:
Logged In: YES 
user_id=31435

BTW, if it *weren't* for the code breakage, I'd be in favor 
of doing this.  A quick survey at work yesterday showed 
that most of those who expressed a preference were at least 
mildly in favor of calling .emacs "pure name, no 
extension".  While the docstring is clear that it's treated 
as "pure extension", and that's what the code does too, the 
Library Manual docs are consistent with either notion.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-29 03:09

Message:
Logged In: YES 
user_id=498191

After a good night, I understand that this patch would break too much code and be very confusing. So I suggest to close it as rejected.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-28 10:51

Message:
Logged In: YES 
user_id=21627

I also dislike this patch. The current behaviour completely
matches the documented behaviour; changing it might break
existing applications. If you need a different behaviour,
write a different function.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 03:55

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Sebastien Keim (s_keim)
Date: 2002-03-28 03:42

Message:
Logged In: YES 
user_id=498191

oop's your right.

I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :(

But if we replace:
if i<1 or p[i-1]=='/':
by:
if i<0 or i<p.rfind('/'):

We should win in performances without breaking current behavior, or am I missing something else?

About the behavior change proposal:
My opinion is that the 'leading dot means hidden' is a quite strong convention in unixes (and no, not only for the ls utility). 
But this is not true on other os (at least on Mac and Windows). So, if cross platform predictability is important (and I think it is), I agree it is probably better to not try to change this.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-28 03:02

Message:
Logged In: YES 
user_id=31435

I expect this change has scant chance of being accepted.

The idea that leading dot means "hidden" is an arbitrary 
convention of the ls utility, and your desire to call 
a .name file "pure name" instead of "pure extension" seems  
arbitrary too.  The behavior of splitext is perfectly 
predictable as-is across platforms now (note the 
implication:  if you intend to change the semantics for 
posixpath, you'll also have to sell that it should be 
changed for dospath.py, ntpath.py, macpath.py, 
os2emxpath.py, and riscospath.py).

Note that the patched function splits, e.g.,

'/usr/local/tim.one/seven'

into

'/usr/local/tim'

and

'.one/seven'

I assume that's not the result you intended.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 09:49:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 01:49:33 -0800
Subject: [Patches] [ python-Patches-536661 ] splitext performances improvement
Message-ID: <E16qt0n-0005bO-00@usw-sf-web4.sourceforge.net>

Patches item #536661, was opened at 2002-03-29 09:06
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext performances improvement

Initial Comment:
After more thought, I must admit that the behavior change in splitext, I proposed with patch 536120 is not acceptable. So I would instead propose this one which should only improve performances without modifying behavior.
The following bench says that patched splitext is between 2x(for l1) and 25x(for l2) faster than the original one.

The diff patch also test_posixpath.py to check the pitfall described by Tim comments in patch 536120 page.

def splitext(p):
    root, ext = '', ''
    for c in p:
        if c == '/':
            root, ext = root + ext + c, ''
        elif c == '.':
            if ext:
                root, ext = root + ext, c
            else:
                ext = c
        elif ext:
            ext = ext + c
        else:
            root = root + c
    return root, ext

def splitext2(p):
    i = p.rfind('.')
    if i<=p.rfind('/'):
        return p, ''
    else:
        return p[:i], p[i:]

l1 = ('t','.t','a.b/','a.b','/a.b','a.b/.c','a.b/c.d')

l2 = (
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut',
'reeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeyttyutyuyuttyuyut.tyyttyt',
'/iuouiiuuoiiuiikhjzekezhjzekejkejkzejkhejkhzejzehjkhjezhjkehzkhjezh.tyyttyt'
    )

for i in l1+l2:
    assert splitext2(i) == splitext(i)

import time

def test(f,args):
    t = time.clock()
    for p in args:
        for i in range(1000):
            f(p)
    return time.clock() - t

def f(p):pass

a=test(splitext, l1)
b=test(splitext2, l1)
c=test(f,l1)
print a,b,c,(a-c)/(b-c)

a=test(splitext, l2)
b=test(splitext2, l2)
c=test(f,l2)
print a,b,c,(a-c)/(b-c)


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 10:49

Message:
Logged In: YES 
user_id=21627

The patch looks good to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 15:41:02 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 07:41:02 -0800
Subject: [Patches] [ python-Patches-536769 ] Add -Xcompiler flag
Message-ID: <E16qyUw-0001lj-00@usw-sf-web3.sourceforge.net>

Patches item #536769, was opened at 2002-03-29 10:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536769&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add -Xcompiler flag

Initial Comment:
This patch adds a -Xcompiler flag to both the makesetup 
script and to the Distutils' setup file parser.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536769&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 16:00:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 08:00:35 -0800
Subject: [Patches] [ python-Patches-536769 ] Add -Xcompiler flag
Message-ID: <E16qynr-0003br-00@usw-sf-web2.sourceforge.net>

Patches item #536769, was opened at 2002-03-29 16:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536769&group_id=5470

Category: Distutils and setup.py
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add -Xcompiler flag

Initial Comment:
This patch adds a -Xcompiler flag to both the makesetup 
script and to the Distutils' setup file parser.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 17:00

Message:
Logged In: YES 
user_id=21627

This patch looks fine to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536769&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 16:12:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 08:12:42 -0800
Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode
Message-ID: <E16qyza-0003iY-00@usw-sf-web2.sourceforge.net>

Patches item #536241, was opened at 2002-03-28 08:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
>Assigned to: A.M. Kuchling (akuchling)
Summary: string.zfill and unicode

Initial Comment:
This patch makes the function string.zfill work with 
unicode instances (and instances of str and unicode 
subclasses). Currently string.zfill(u"123", 10) 
results in "0000u'123'". With this patch the result is 
u'0000000123'.

Should zfill be made a real str und unicode method? I 
noticed that a zfill implementation is available in 
unicodeobject.c, but commented out.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 16:24:36 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 08:24:36 -0800
Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode
Message-ID: <E16qzB6-0002Ay-00@usw-sf-web3.sourceforge.net>

Patches item #536241, was opened at 2002-03-28 08:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: A.M. Kuchling (akuchling)
Summary: string.zfill and unicode

Initial Comment:
This patch makes the function string.zfill work with 
unicode instances (and instances of str and unicode 
subclasses). Currently string.zfill(u"123", 10) 
results in "0000u'123'". With this patch the result is 
u'0000000123'.

Should zfill be made a real str und unicode method? I 
noticed that a zfill implementation is available in 
unicodeobject.c, but commented out.

----------------------------------------------------------------------

>Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-29 11:24

Message:
Logged In: YES 
user_id=11375

Thanks for your patch!  I've checked it into CVS, with two 
modifications.  First, I removed the code to handle the case 
where Python doesn't have a unicode() built-in; there's no 
expection that 
you can take the standard library for Python version N and use 
it with version N-1, so this code isn't needed.

Second, I changed string.zfill() to take the str() and not the repr()
when it gets a non-string object because that seems to make 
more sense.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 16:30:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 08:30:27 -0800
Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library
Message-ID: <E16qzGl-0003sx-00@usw-sf-web2.sourceforge.net>

Patches item #527027, was opened at 2002-03-07 17:45
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470

Category: Build
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Ondrej Palkovsky (ondrap)
Assigned to: Martin v. Löwis (loewis)
Summary: Allow building python as shared library

Initial Comment:
This patch allows building python as a shared library. 

 - enables building shared python with
'--enable-shared-python' configuration option
 - builds the file '.so' by default and changes the
name on installation, so it is currently enabled on
linux to be '0.0', but this can be easily changed
 - tested on linux, solaris(gcc), tru64(cc) and HP-UX
11.0(aCC). It produces the library using LDSHARED -o,
while some architectures that were already building
shared, used different algorithm. I'm not sure if it
didn't break them (someone should check DGUX and BeOS).
It also makes building shared library disabled by
default, while these architectures had it enabled.

- it rectifies a small problem on solaris2.8, that
makes double inclusion of thread.o (this produces error
on 'ld' for shared library).


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 17:30

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, committed as

Makefile.pre.in 1.78
README 1.143
configure 1.292
configure.in 1.302
ACKS 1.164
NEWS 1.372

I hope you'll stay around to deal with the bug reports for
this feature :-)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-25 15:01

Message:
Logged In: YES 
user_id=21627

I think the remaining issues are shallow only: Few users
will care about --enable-shared on BeOS and DG/UX; those
will hopefully contribute patches. Likewise, for .sl
libraries - I don't know HP-UX shared linking well enough to
determine whether it supports library versions.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-25 13:27

Message:
Logged In: YES 
user_id=88611

I have rebuilt the patch against CVS
 - --enable-shared instead of --enable-shared-python
 - sets rpath on Linux and Tru64 too
 - I didn't change the SOVERSION stuff. I think we should
come to a conclusion with versioning first. BTW: am I
correct that make install should create the symlink .sl ->
.sl.1.0 when we use versioning? 
 - this patch may break BeOS and DgUX. I think somone with
access to these platforms should test it (he should use
--enable-shared, as this patch changes the default behavior
to --disable-shared for all platforms).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-20 08:41

Message:
Logged In: YES 
user_id=21627

The API version is maintained in modsupport.h:API_VERSION.

I'm personally not concerned about breakage of API during
the development of a new release. Absolutely no breakage
should occur in maintenance releases. After all, a
maintenance will replace pythonxy.dll on Windows with no
protection against API breakage, thus, it is a bug if the
API changes in a maintenace release.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 18:14

Message:
Logged In: YES 
user_id=10327

This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes.

That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-)

This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python.  :-(

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 18:05

Message:
Logged In: YES 
user_id=21627

The CVS version will usually use a completely different
library name (e.g. libpython23.so), so there will be no
conflicts with prior versions.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-19 16:13

Message:
Logged In: YES 
user_id=10327

A SOVERSION of 0.0 makes perfect sense for the CVS head.

Release versions should probably use 1.0.

I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0.
That way, installing a newer CVS version won't instantly beak everything people have built with it.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-19 15:35

Message:
Logged In: YES 
user_id=21627

The patch looks quite good. There are a number of remaining
issues that need to be resolved, though:

- please regenerate the patch against the current CVS. As
is, it fails to apply; parts of it are already in the CVS
(the thr_create changes)

- I think the SOVERSION should be 1.0, atleast initially:
for most Python releases, there will be only a single
release of the shared library, which should be named 1.0.

- Why do you think that no rpath is needed on Linux? It is
not needed if prefix is /usr, and on many installations, it
is also not needed if prefix is /usr/local. For all other
configurations, you still need a rpath on Linux.

- IMO, there could be a default case, assuming SysV-ish
configurations.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-18 16:01

Message:
Logged In: YES 
user_id=88611

As far as I can see, the problems are:
relocation of binary/library path (this is solved by 
adding -R to LDSHARED depending on platform)
SOVERSION - some systems like it, some do not. If you do 
SOVERSION, you must create a link to the proper version in 
the installation phase. IMO we can just avoid versioning 
at all and let the distribution builders do it themselves. 
The other way is to attach full version of python as 
SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1).

I'm the author of the patch (ppython.diff). I'm not the 
author of the file dynamic.diff, I have included it here 
by accident and if it is possible to delete it from this 
page, it should be done.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-16 17:38

Message:
Logged In: YES 
user_id=6656

This ain't gonna happen on the 2.2.x branch, so changing group.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-15 15:05

Message:
Logged In: YES 
user_id=21627

Yes, that is all right. The approach, in general, is also
good, but please review my comments to #497102.

Also, I still like to get a clarification as to who is the
author of this code.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 17:10

Message:
Logged In: YES 
user_id=88611

Ok, so no libtool. Did I get correctly, that you want:
 --enable-shared/--enable-static instead of
--enable-shared-python, --disable-shared-python
 - Do you agree with the way it is done in the patch
(ppython.diff) or do you propose another way?
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-08 15:44

Message:
Logged In: YES 
user_id=6380

libtool sucks.  Case closed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-08 12:09

Message:
Logged In: YES 
user_id=21627

While I agree on the "not Linux only" and "use standard
configure options" comments; I completely disagree on
libtool - only over my dead body. libtool is broken, and it
is a good thing that Python configure knows the compiler
command line options on its own.

----------------------------------------------------------------------

Comment By: Ondrej Palkovsky (ondrap)
Date: 2002-03-08 11:52

Message:
Logged In: YES 
user_id=88611

Sorry, I've been inspired by the former patch and I have
mistakenly included it here. My patch doesn't use LD_PRELOAD
and creates the .a with -fPIC, so it is compatibile with
other  makes (not only GNU). I'll try to learn libttool and
and try to do it that way though.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-03-08 11:22

Message:
Logged In: YES 
user_id=10327

IMHO this patch has a couple of problems.

The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used!

The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool.

It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-07 19:36

Message:
Logged In: YES 
user_id=21627

As the first issue, I'd like to clarify ownership of this
code. This is the same patch as #497102, AFAICT, but
contributed by a different submitter. So who wrote created
that code originally?

The same comments that I made to #497102 apply to this patch
as well: why 0.0; please no unrelated changes (Hurd); why
create both pic and non-pic objects; please no
compiler-specific flags in the makefile; why LD_PRELOAD.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-03-07 18:09

Message:
Logged In: YES 
user_id=6380

Could you submit the thread.o double inclusion patch
separately? It's much less controversial.

I like the idea of building Python as a shared lib, but I'm
hesitant to add more code to an already super complex area
of the configuration and build process.

I need more reviewers. Maybe the submitter can get some
other developers to comment?

P.S. it would be better if you used the current CVS or at
least the final 2.2 release as a basis for your patch.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 18:01:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 10:01:52 -0800
Subject: [Patches] [ python-Patches-536769 ] Add -Xcompiler flag
Message-ID: <E16r0hE-0005C5-00@usw-sf-web1.sourceforge.net>

Patches item #536769, was opened at 2002-03-29 10:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536769&group_id=5470

Category: Distutils and setup.py
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: A.M. Kuchling (akuchling)
>Assigned to: A.M. Kuchling (akuchling)
Summary: Add -Xcompiler flag

Initial Comment:
This patch adds a -Xcompiler flag to both the makesetup 
script and to the Distutils' setup file parser.


----------------------------------------------------------------------

>Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-29 13:01

Message:
Logged In: YES 
user_id=11375

Checked in.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 11:00

Message:
Logged In: YES 
user_id=21627

This patch looks fine to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536769&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 18:49:14 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 10:49:14 -0800
Subject: [Patches] [ python-Patches-536578 ] patch for bug 462783 mmap bus error
Message-ID: <E16r1R4-0003Ur-00@usw-sf-web3.sourceforge.net>

Patches item #536578, was opened at 2002-03-29 03:02
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536578&group_id=5470

Category: Modules
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Green (gpgreen)
Assigned to: Nobody/Anonymous (nobody)
Summary: patch for bug 462783 mmap bus error

Initial Comment:
This patch fixes SF 462783. The problem was that an
mmap'ed file caused a bus error when reading data from
the file. The root cause is that the file wasn't
flushed following a write. The patched module will
throw an OSError exception if the mmap object was
created without being flushed, fseek'ed, or closed,
following a write. This patch only applies to unix
systems. Windows seems to handle the condition ok.

The problem with the patch is that existing code can be
broken. On some systems, (FreeBSD, irix), as long as
the file was flushed before attempting to read from the
mmap object, it would work with no bus error. Linux
gets a bus error no matter what. So existing code that
did flush (or fseek) before a read will now get an
OSError exception during mmap creation instead.

I tried this on the cvs version of python 2.3, on linux
redhat 7.2, FreeBSD 4.5, irix 6.5 n32, and windows 2000.

-- Greg Green

----------------------------------------------------------------------

>Comment By: Greg Green (gpgreen)
Date: 2002-03-29 18:49

Message:
Logged In: YES 
user_id=499627

my email is gregory.p.green@boeing.com

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536578&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 18:56:33 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 10:56:33 -0800
Subject: [Patches] [ python-Patches-536661 ] splitext performances improvement
Message-ID: <E16r1Y9-0005Dl-00@usw-sf-web2.sourceforge.net>

Patches item #536661, was opened at 2002-03-29 03:06
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext performances improvement

Initial Comment:
After more thought, I must admit that the behavior change in splitext, I proposed with patch 536120 is not acceptable. So I would instead propose this one which should only improve performances without modifying behavior.
The following bench says that patched splitext is between 2x(for l1) and 25x(for l2) faster than the original one.

The diff patch also test_posixpath.py to check the pitfall described by Tim comments in patch 536120 page.

def splitext(p):
    root, ext = '', ''
    for c in p:
        if c == '/':
            root, ext = root + ext + c, ''
        elif c == '.':
            if ext:
                root, ext = root + ext, c
            else:
                ext = c
        elif ext:
            ext = ext + c
        else:
            root = root + c
    return root, ext

def splitext2(p):
    i = p.rfind('.')
    if i<=p.rfind('/'):
        return p, ''
    else:
        return p[:i], p[i:]

l1 = ('t','.t','a.b/','a.b','/a.b','a.b/.c','a.b/c.d')

l2 = (
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut',
'reeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeyttyutyuyuttyuyut.tyyttyt',
'/iuouiiuuoiiuiikhjzekezhjzekejkejkzejkhejkhzejzehjkhjezhjkehzkhjezh.tyyttyt'
    )

for i in l1+l2:
    assert splitext2(i) == splitext(i)

import time

def test(f,args):
    t = time.clock()
    for p in args:
        for i in range(1000):
            f(p)
    return time.clock() - t

def f(p):pass

a=test(splitext, l1)
b=test(splitext2, l1)
c=test(f,l1)
print a,b,c,(a-c)/(b-c)

a=test(splitext, l2)
b=test(splitext2, l2)
c=test(f,l2)
print a,b,c,(a-c)/(b-c)


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-29 13:56

Message:
Logged In: YES 
user_id=31435

I like it fine so far as it goes, but I'd like it a lot 
more if it also patched the splitext and test 
implementations for other platforms.  It's not good that, 
e.g., posixpath.py and ntpath.py get more and more out of 
synch over time, and that their test suites also diverge.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 04:49

Message:
Logged In: YES 
user_id=21627

The patch looks good to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 19:52:58 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 11:52:58 -0800
Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass
Message-ID: <E16r2Qk-0000RG-00@usw-sf-web3.sourceforge.net>

Patches item #536883, was opened at 2002-03-29 11:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
Assigned to: Nobody/Anonymous (nobody)
Summary: SimpleXMLRPCServer auto-docing subclass

Initial Comment:
This SimpleXMLRPCServer subclass automatically serves 
HTML documentation, generated using pydoc, in response 
to an HTTP GET request (XML-RPC always uses POST).

Here are some examples:
http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py
http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 19:53:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 11:53:28 -0800
Subject: [Patches] [ python-Patches-536883 ] SimpleXMLRPCServer auto-docing subclass
Message-ID: <E16r2RE-0000Rh-00@usw-sf-web3.sourceforge.net>

Patches item #536883, was opened at 2002-03-29 11:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
>Assigned to: Fredrik Lundh (effbot)
Summary: SimpleXMLRPCServer auto-docing subclass

Initial Comment:
This SimpleXMLRPCServer subclass automatically serves 
HTML documentation, generated using pydoc, in response 
to an HTTP GET request (XML-RPC always uses POST).

Here are some examples:
http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc1.py
http://www.sweetapp.com/cgi-bin/xmlrpc-test/rpc2.py


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536883&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 19:53:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 11:53:52 -0800
Subject: [Patches] [ python-Patches-531629 ] Add multicall support to xmlrpclib
Message-ID: <E16r2Rc-0000Rs-00@usw-sf-web3.sourceforge.net>

Patches item #531629, was opened at 2002-03-18 15:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531629&group_id=5470

Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Quinlan (bquinlan)
>Assigned to: Fredrik Lundh (effbot)
Summary: Add multicall support to xmlrpclib

Initial Comment:
Adds a new object to xmlrpclib that allows the user to 
boxcared XML-RPC requests e.g.

server_proxy = ServerProxy(...)
multicall = MultiCall(server_proxy)
multicall.add(2,3)
multicall.get_address("Guido")

add_result, address = multicall()

see http://www.xmlrpc.com/discuss/msgReader$1208


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531629&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 21:10:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 13:10:35 -0800
Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C"
Message-ID: <E16r3dr-00074G-00@usw-sf-web1.sourceforge.net>

Patches item #536908, was opened at 2002-03-29 16:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
Summary: missing #include guards/extern "C"

Initial Comment:
cvs server: Diffing .
Index: cStringIO.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/cStringI
O.h,v
retrieving revision 2.15
diff -r2.15 cStringIO.h
2a3,5
> #ifdef __cplusplus
> extern "C" {
> #endif
130a134,136
> #ifdef __cplusplus
> }
> #endif
Index: descrobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/descrobj
ect.h,v
retrieving revision 2.8
diff -r2.8 descrobject.h
1a2,6
> #ifndef Py_DESCROBJECT_H
> #define Py_DESCROBJECT_H
> #ifdef __cplusplus
> extern "C" {
> #endif
80a86,88
> #ifdef __cplusplus
> }
> #endif
Index: iterobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/iterobje
ct.h,v
retrieving revision 1.3
diff -r1.3 iterobject.h
0a1,2
> #ifndef Py_ITEROBJECT_H
> #define Py_ITEROBJECT_H
1a4,6
> #ifdef __cplusplus
> extern "C" {
> #endif
13a19,22
> #ifdef __cplusplus
> }
> #endif
> #endif Py_ITEROBJECT_H


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 21:11:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 13:11:23 -0800
Subject: [Patches] [ python-Patches-536909 ] pymalloc for types and other cleanups
Message-ID: <E16r3ed-0006dQ-00@usw-sf-web2.sourceforge.net>

Patches item #536909, was opened at 2002-03-29 21:11
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Tim Peters (tim_one)
Summary: pymalloc for types and other cleanups

Initial Comment:
This patch changes typeobject to use pymalloc for
managing the memory of subclassable types. It also
fixes a bug that caused an interpreter built without
GC to crash.

Testing this patch was a bitch.  There are three knobs
related to MM now (with-cycle-gc, with-pymalloc,
and PYMALLOC_DEBUG).  I think I found different bugs
when testing with each possible combination.

There's one bit of ugliness in this patch.  Extension
module writers have to use _PyMalloc_Del to initialize
the tp_free pointer.  There should be a "public"
function for that.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 21:22:52 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 13:22:52 -0800
Subject: [Patches] [ python-Patches-536407 ] Comprehensibility patch (typeobject.c)
Message-ID: <E16r3pk-0001Tk-00@usw-sf-web4.sourceforge.net>

Patches item #536407, was opened at 2002-03-28 13:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
Summary: Comprehensibility patch (typeobject.c)

Initial Comment:
--- typeobject.c	Mon Dec 17 12:14:22 2001
+++ typeobject.c.new	Thu Mar 28 13:46:03 2002
@@ -1186,8 +1186,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the 
metatype is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1195,34 +1195,50 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData
(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+        
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, 
name);
 
-	/* Look in tp_dict of this type and its bases 
*/
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)
NULL, (PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+                
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+            /* Data descriptors implement 
tp_descr_set to intercept
+             * writes. Assume the attribute is not 
overridden in
+             * type's tp_dict (and bases): call the 
descriptor now.
+             */
+			return meta_get
(meta_attribute,
+                            (PyObject *)type, 
(PyObject *)metatype);
+        }
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, 
(PyObject *)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look 
in tp_dict of this
+     * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+        /* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+            /* NULL 2nd argument indicates the 
descriptor was found on
+             * the target object itself (or a base)  
*/
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+        }
+        
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+     * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, 
(PyObject *)type, (PyObject *)metatype);
+
+    /* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */


----------------------------------------------------------------------

>Comment By: David Abrahams (david_abrahams)
Date: 2002-03-29 16:22

Message:
Logged In: YES 
user_id=52572

I have updated the patch so that it is made against the 
current sources.

-------

--- typeobject.c	Thu Mar 28 00:33:33 2002
+++ typeobject.c.new	Fri Mar 29 16:20:12 2002
@@ -1237,8 +1237,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the metatype 
is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1246,40 +1246,56 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+		
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, name);
 
-	/* Look in tp_dict of this type and its bases */
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)NULL, 
(PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+				
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+			/* Data descriptors implement 
tp_descr_set to intercept
+			 * writes. Assume the attribute is 
not overridden in
+			 * type's tp_dict (and bases): 
call the descriptor now.
+			 */
+			return meta_get(meta_attribute,
+						
	(PyObject *)type, (PyObject *)metatype);
+		}
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, (PyObject 
*)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look in 
tp_dict of this
+	 * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+		/* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+			/* NULL 2nd argument indicates the 
descriptor was found on
+			 * the target object itself (or a 
base)  */
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+		}
+		
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+	 * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, (PyObject 
*)type, (PyObject *)metatype);
+
+	/* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */
 	PyErr_Format(PyExc_AttributeError,
-		     "type object '%.50s' has no 
attribute '%.400s'",
-		     type->tp_name, PyString_AS_STRING
(name));
+			 "type object '%.50s' has no 
attribute '%.400s'",
+			 type->tp_name, PyString_AS_STRING
(name));
 	return NULL;
 }
 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 22:27:18 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 14:27:18 -0800
Subject: [Patches] [ python-Patches-536407 ] Comprehensibility patch (typeobject.c)
Message-ID: <E16r4q6-000271-00@usw-sf-web3.sourceforge.net>

Patches item #536407, was opened at 2002-03-28 18:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
Summary: Comprehensibility patch (typeobject.c)

Initial Comment:
--- typeobject.c	Mon Dec 17 12:14:22 2001
+++ typeobject.c.new	Thu Mar 28 13:46:03 2002
@@ -1186,8 +1186,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the 
metatype is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1195,34 +1195,50 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData
(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+        
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, 
name);
 
-	/* Look in tp_dict of this type and its bases 
*/
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)
NULL, (PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+                
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+            /* Data descriptors implement 
tp_descr_set to intercept
+             * writes. Assume the attribute is not 
overridden in
+             * type's tp_dict (and bases): call the 
descriptor now.
+             */
+			return meta_get
(meta_attribute,
+                            (PyObject *)type, 
(PyObject *)metatype);
+        }
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, 
(PyObject *)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look 
in tp_dict of this
+     * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+        /* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+            /* NULL 2nd argument indicates the 
descriptor was found on
+             * the target object itself (or a base)  
*/
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+        }
+        
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+     * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, 
(PyObject *)type, (PyObject *)metatype);
+
+    /* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */


----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-29 22:27

Message:
Logged In: YES 
user_id=35752

Don't paste the patch in the comment box.

----------------------------------------------------------------------

Comment By: David Abrahams (david_abrahams)
Date: 2002-03-29 21:22

Message:
Logged In: YES 
user_id=52572

I have updated the patch so that it is made against the 
current sources.

-------

--- typeobject.c	Thu Mar 28 00:33:33 2002
+++ typeobject.c.new	Fri Mar 29 16:20:12 2002
@@ -1237,8 +1237,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the metatype 
is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1246,40 +1246,56 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+		
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, name);
 
-	/* Look in tp_dict of this type and its bases */
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)NULL, 
(PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+				
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+			/* Data descriptors implement 
tp_descr_set to intercept
+			 * writes. Assume the attribute is 
not overridden in
+			 * type's tp_dict (and bases): 
call the descriptor now.
+			 */
+			return meta_get(meta_attribute,
+						
	(PyObject *)type, (PyObject *)metatype);
+		}
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, (PyObject 
*)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look in 
tp_dict of this
+	 * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+		/* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+			/* NULL 2nd argument indicates the 
descriptor was found on
+			 * the target object itself (or a 
base)  */
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+		}
+		
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+	 * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, (PyObject 
*)type, (PyObject *)metatype);
+
+	/* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */
 	PyErr_Format(PyExc_AttributeError,
-		     "type object '%.50s' has no 
attribute '%.400s'",
-		     type->tp_name, PyString_AS_STRING
(name));
+			 "type object '%.50s' has no 
attribute '%.400s'",
+			 type->tp_name, PyString_AS_STRING
(name));
 	return NULL;
 }
 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 22:30:42 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 14:30:42 -0800
Subject: [Patches] [ python-Patches-536407 ] Comprehensibility patch (typeobject.c)
Message-ID: <E16r4tO-00029F-00@usw-sf-web3.sourceforge.net>

Patches item #536407, was opened at 2002-03-28 13:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
Summary: Comprehensibility patch (typeobject.c)

Initial Comment:
--- typeobject.c	Mon Dec 17 12:14:22 2001
+++ typeobject.c.new	Thu Mar 28 13:46:03 2002
@@ -1186,8 +1186,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the 
metatype is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1195,34 +1195,50 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData
(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+        
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, 
name);
 
-	/* Look in tp_dict of this type and its bases 
*/
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)
NULL, (PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+                
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+            /* Data descriptors implement 
tp_descr_set to intercept
+             * writes. Assume the attribute is not 
overridden in
+             * type's tp_dict (and bases): call the 
descriptor now.
+             */
+			return meta_get
(meta_attribute,
+                            (PyObject *)type, 
(PyObject *)metatype);
+        }
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, 
(PyObject *)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look 
in tp_dict of this
+     * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+        /* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+            /* NULL 2nd argument indicates the 
descriptor was found on
+             * the target object itself (or a base)  
*/
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+        }
+        
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+     * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, 
(PyObject *)type, (PyObject *)metatype);
+
+    /* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */


----------------------------------------------------------------------

>Comment By: David Abrahams (david_abrahams)
Date: 2002-03-29 17:30

Message:
Logged In: YES 
user_id=52572

Thanks, Neil, I think I got the picture already (see 
Python-Dev).

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-29 17:27

Message:
Logged In: YES 
user_id=35752

Don't paste the patch in the comment box.

----------------------------------------------------------------------

Comment By: David Abrahams (david_abrahams)
Date: 2002-03-29 16:22

Message:
Logged In: YES 
user_id=52572

I have updated the patch so that it is made against the 
current sources.

-------

--- typeobject.c	Thu Mar 28 00:33:33 2002
+++ typeobject.c.new	Fri Mar 29 16:20:12 2002
@@ -1237,8 +1237,8 @@
 type_getattro(PyTypeObject *type, PyObject *name)
 {
 	PyTypeObject *metatype = type->ob_type;
-	PyObject *descr, *res;
-	descrgetfunc f;
+	PyObject *meta_attribute, *attribute;
+	descrgetfunc meta_get;
 
 	/* Initialize this type (we'll assume the metatype 
is initialized) */
 	if (type->tp_dict == NULL) {
@@ -1246,40 +1246,56 @@
 			return NULL;
 	}
 
-	/* Get a descriptor from the metatype */
-	descr = _PyType_Lookup(metatype, name);
-	f = NULL;
-	if (descr != NULL) {
-		f = descr->ob_type->tp_descr_get;
-		if (f != NULL && PyDescr_IsData(descr))
-			return f(descr,
-				 (PyObject *)type, 
(PyObject *)metatype);
-	}
+	/* No readable descriptor found yet */
+	meta_get = NULL;
+		
+	/* Look for the attribute in the metatype */
+	meta_attribute = _PyType_Lookup(metatype, name);
 
-	/* Look in tp_dict of this type and its bases */
-	res = _PyType_Lookup(type, name);
-	if (res != NULL) {
-		f = res->ob_type->tp_descr_get;
-		if (f != NULL)
-			return f(res, (PyObject *)NULL, 
(PyObject *)type);
-		Py_INCREF(res);
-		return res;
+	if (meta_attribute != NULL) {
+		meta_get = meta_attribute->ob_type-
>tp_descr_get;
+				
+		if (meta_get != NULL && PyDescr_IsData
(meta_attribute)) {
+			/* Data descriptors implement 
tp_descr_set to intercept
+			 * writes. Assume the attribute is 
not overridden in
+			 * type's tp_dict (and bases): 
call the descriptor now.
+			 */
+			return meta_get(meta_attribute,
+						
	(PyObject *)type, (PyObject *)metatype);
+		}
 	}
 
-	/* Use the descriptor from the metatype */
-	if (f != NULL) {
-		res = f(descr, (PyObject *)type, (PyObject 
*)metatype);
-		return res;
+	/* No data descriptor found on metatype. Look in 
tp_dict of this
+	 * type and its bases */
+	attribute = _PyType_Lookup(type, name);
+	if (attribute != NULL) {
+		/* Implement descriptor functionality, if 
any */
+		descrgetfunc local_get = attribute-
>ob_type->tp_descr_get;
+		if (local_get != NULL) {
+			/* NULL 2nd argument indicates the 
descriptor was found on
+			 * the target object itself (or a 
base)  */
+			return local_get(attribute, 
(PyObject *)NULL, (PyObject *)type);
+		}
+		
+		Py_INCREF(attribute);
+		return attribute;
 	}
-	if (descr != NULL) {
-		Py_INCREF(descr);
-		return descr;
+
+	/* No attribute found in local __dict__ (or 
bases): use the
+	 * descriptor from the metatype, if any */
+	if (meta_get != NULL)
+		return meta_get(meta_attribute, (PyObject 
*)type, (PyObject *)metatype);
+
+	/* If an ordinary attribute was found on the 
metatype, return it now. */
+	if (meta_attribute != NULL) {
+		Py_INCREF(meta_attribute);
+		return meta_attribute;
 	}
 
 	/* Give up */
 	PyErr_Format(PyExc_AttributeError,
-		     "type object '%.50s' has no 
attribute '%.400s'",
-		     type->tp_name, PyString_AS_STRING
(name));
+			 "type object '%.50s' has no 
attribute '%.400s'",
+			 type->tp_name, PyString_AS_STRING
(name));
 	return NULL;
 }
 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536407&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 22:35:13 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 14:35:13 -0800
Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C"
Message-ID: <E16r4xl-0002CX-00@usw-sf-web3.sourceforge.net>

Patches item #536908, was opened at 2002-03-29 22:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
>Summary: missing #include guards/extern "C"

Initial Comment:
cvs server: Diffing .
Index: cStringIO.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/cStringI
O.h,v
retrieving revision 2.15
diff -r2.15 cStringIO.h
2a3,5
> #ifdef __cplusplus
> extern "C" {
> #endif
130a134,136
> #ifdef __cplusplus
> }
> #endif
Index: descrobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/descrobj
ect.h,v
retrieving revision 2.8
diff -r2.8 descrobject.h
1a2,6
> #ifndef Py_DESCROBJECT_H
> #define Py_DESCROBJECT_H
> #ifdef __cplusplus
> extern "C" {
> #endif
80a86,88
> #ifdef __cplusplus
> }
> #endif
Index: iterobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/iterobje
ct.h,v
retrieving revision 1.3
diff -r1.3 iterobject.h
0a1,2
> #ifndef Py_ITEROBJECT_H
> #define Py_ITEROBJECT_H
1a4,6
> #ifdef __cplusplus
> extern "C" {
> #endif
13a19,22
> #ifdef __cplusplus
> }
> #endif
> #endif Py_ITEROBJECT_H


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 23:35

Message:
Logged In: YES 
user_id=21627

Please attach the patch as a context (-c) or unified (-u)
diff to this report.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 22:37:43 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 14:37:43 -0800
Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C"
Message-ID: <E16r50B-0002EA-00@usw-sf-web3.sourceforge.net>

Patches item #536908, was opened at 2002-03-29 16:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
>Summary: missing #include guards/extern "C"

Initial Comment:
cvs server: Diffing .
Index: cStringIO.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/cStringI
O.h,v
retrieving revision 2.15
diff -r2.15 cStringIO.h
2a3,5
> #ifdef __cplusplus
> extern "C" {
> #endif
130a134,136
> #ifdef __cplusplus
> }
> #endif
Index: descrobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/descrobj
ect.h,v
retrieving revision 2.8
diff -r2.8 descrobject.h
1a2,6
> #ifndef Py_DESCROBJECT_H
> #define Py_DESCROBJECT_H
> #ifdef __cplusplus
> extern "C" {
> #endif
80a86,88
> #ifdef __cplusplus
> }
> #endif
Index: iterobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/iterobje
ct.h,v
retrieving revision 1.3
diff -r1.3 iterobject.h
0a1,2
> #ifndef Py_ITEROBJECT_H
> #define Py_ITEROBJECT_H
1a4,6
> #ifdef __cplusplus
> extern "C" {
> #endif
13a19,22
> #ifdef __cplusplus
> }
> #endif
> #endif Py_ITEROBJECT_H


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 17:35

Message:
Logged In: YES 
user_id=21627

Please attach the patch as a context (-c) or unified (-u)
diff to this report.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 22:47:16 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 14:47:16 -0800
Subject: [Patches] [ python-Patches-536909 ] pymalloc for types and other cleanups
Message-ID: <E16r59Q-0002Kq-00@usw-sf-web3.sourceforge.net>

Patches item #536909, was opened at 2002-03-29 22:11
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Tim Peters (tim_one)
Summary: pymalloc for types and other cleanups

Initial Comment:
This patch changes typeobject to use pymalloc for
managing the memory of subclassable types. It also
fixes a bug that caused an interpreter built without
GC to crash.

Testing this patch was a bitch.  There are three knobs
related to MM now (with-cycle-gc, with-pymalloc,
and PYMALLOC_DEBUG).  I think I found different bugs
when testing with each possible combination.

There's one bit of ugliness in this patch.  Extension
module writers have to use _PyMalloc_Del to initialize
the tp_free pointer.  There should be a "public"
function for that.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 23:47

Message:
Logged In: YES 
user_id=21627

I see another memory allocation family here: What function
should objects allocated through PyType_GenericAlloc be
released with?

If you change the behaviour of PyType_GenericAlloc, all
types in extensions written for 2.2 that use
PyType_GenericAlloc will break, since they will still have
PyObject_Del in their tp_free slot.

I believe "families" should always be complete, so along
with PyType_GenericAlloc goes PyType_GenericFree.

If you want it fully backwards compatible, you need to
introduce PyType_PyMallocAlloc...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470


From noreply@sourceforge.net  Fri Mar 29 23:09:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 15:09:23 -0800
Subject: [Patches] [ python-Patches-536909 ] pymalloc for types and other cleanups
Message-ID: <E16r5Up-0002cp-00@usw-sf-web4.sourceforge.net>

Patches item #536909, was opened at 2002-03-29 21:11
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
Assigned to: Tim Peters (tim_one)
Summary: pymalloc for types and other cleanups

Initial Comment:
This patch changes typeobject to use pymalloc for
managing the memory of subclassable types. It also
fixes a bug that caused an interpreter built without
GC to crash.

Testing this patch was a bitch.  There are three knobs
related to MM now (with-cycle-gc, with-pymalloc,
and PYMALLOC_DEBUG).  I think I found different bugs
when testing with each possible combination.

There's one bit of ugliness in this patch.  Extension
module writers have to use _PyMalloc_Del to initialize
the tp_free pointer.  There should be a "public"
function for that.

----------------------------------------------------------------------

>Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-29 23:09

Message:
Logged In: YES 
user_id=35752

I'm counting on Tim to finish PyMem_NukeIt.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 22:47

Message:
Logged In: YES 
user_id=21627

I see another memory allocation family here: What function
should objects allocated through PyType_GenericAlloc be
released with?

If you change the behaviour of PyType_GenericAlloc, all
types in extensions written for 2.2 that use
PyType_GenericAlloc will break, since they will still have
PyObject_Del in their tp_free slot.

I believe "families" should always be complete, so along
with PyType_GenericAlloc goes PyType_GenericFree.

If you want it fully backwards compatible, you need to
introduce PyType_PyMallocAlloc...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470


From noreply@sourceforge.net  Sat Mar 30 00:08:48 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 29 Mar 2002 16:08:48 -0800
Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals()
Message-ID: <E16r6QK-00008T-00@usw-sf-web2.sourceforge.net>

Patches item #511219, was opened at 2002-01-31 15:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Douady (douady)
Assigned to: Nobody/Anonymous (nobody)
Summary: suppress type restrictions on locals()

Initial Comment:
This patch suppresses the restriction that global and
local dictionaries do not access overloaded __getitem__
and __setitem__ if passed an object derived from class
dict.

An exception is made for the builtin insertion and
reference in the global dict to make sure this object
exists and to suppress the need for the derived class
to take care of this implementation dependent detail.

The behavior of eval and exec has been updated for code
objects which have the CO_NEWLOCALS flag set : if
explicitely passed a local dict, a new local dict is
not generated. This allows one to pass an explicit
local dict to the code object of a function (which
otherwise cannot be achieved). If this cannot be done
for backward compatibility problems, then an
alternative would consist in using the "new" module to
create a code object from a function with CO_NEWLOCALS
reset but it seems logical to me to use the information
explicitely provided.

Free and cell variables are not managed in this
version. If the patch is accepted, I am willing to
finish the job and implement free and cell variables,
but this requires a serious rework of the Cell object:
free variables should be accessed using the method of
the dict in which they relies and today, this dict is
not accessible from the Cell object.

Robustness : Currently, the plain test suite passes
(with a modification of test_desctut which precisely
verifies that the suppressed restriction is enforced).
 I have introduced a new test (test_subdict.py) which
verifies the new behavior.

Because of performance, the plain case (when the local
dict is a plain dict) is optimized so that differences
in performance are not measurable (within 1%) when run
on the test suite (i.e. I timed make test).


----------------------------------------------------------------------

>Comment By: Cesar Douady (douady)
Date: 2002-03-30 01:08

Message:
Logged In: YES 
user_id=428521

to install this patch from python revision 2.2, follow these
steps :
- get the python.diff file from this page
- cd Python-2.2
- run "patch -p1 <wherever_you_put_the_file/python.diff"
- make clobber
- make recheck
- make
- make test
the reason you cannot just type make to rebuild everything
is that .h files are not checked when modules are
recompiled. Thus they are not, unless you remove them with
"make clobber". make recheck is to rebuild pyconfig.h.

----------------------------------------------------------------------

Comment By: Cesar Douady (douady)
Date: 2002-03-28 18:30

Message:
Logged In: YES 
user_id=428521

This patch has been generated from python version 2.2.

----------------------------------------------------------------------

Comment By: Cesar Douady (douady)
Date: 2002-03-19 11:57

Message:
Logged In: YES 
user_id=428521

Granted. Seems fair.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 09:59

Message:
Logged In: YES 
user_id=21627

This is quite a complex change. If you want to see it
integrated, I recommend that you find people that try it out
and report their experience here.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470


From slrlfl2000@yahoo.co.kr  Sat Mar 30 04:32:12 2002
From: slrlfl2000@yahoo.co.kr (slrlfl2000)
Date: Sat, 30 Mar 2002 13:32:12 +0900
Subject: [Patches] [±¤°í] Á¾·®Á¦ ºÀÅõ Àý¾àÇü ¾ÐÃà¾²·¹±âÅë ¼Ò°³
Message-ID: <E16rAac-00068s-00@mail.python.org>

<html>
<body bgcolor=white>
<p align=center><a href="http://www.softface.co.kr/magichome.htm" target="_blank"><img src="http://www.dic4u.com/images2/magicpower.gif" width="675" height="1023" border=0></a><br><br>&nbsp;</p>
<table border="0" align="center">
<tr>
<td>
<p style="LINE-HEIGHT: 120%"><span style="FONT-SIZE: 9pt">¢¹&nbsp;¿øÄ¡¾ÊÀº Á¤º¸¿´´Ù¸é Á¤ÁßÈ÷ »ç°ú µå¸®¸ç, ¼ö½Å °ÅºÎ¸¦ ÇØÁÖ½Ã¸é ´ÙÀ½ºÎÅÍ´Â ¸ÞÀÏÀÌ ¹ß¼ÛµÇÁö ¾ÊÀ» °ÍÀÔ´Ï´Ù.<br>¢¹&nbsp;¸ÞÀÏÅ¬¶óÀÌ¾ðÆ®ÀÇ ÇÊÅÍ ±â´ÉÀ» ÀÌ¿ëÇÏ¿© [±¤°í] ¹®±¸¸¦ ÇÊÅÍ¸µÇÏ¸é ¸ðµç ±¤°í ¸ÞÀÏÀ» ÀÚµ¿À¸·Î Â÷´ÜÇÏ½Ç ¼ö ÀÖ½À´Ï´Ù.</span></p>
</td>
</tr>
<tr>
<td align="center"><a href="http://www.softface.co.kr/unsub.asp?flag=magicpower&email=patches@python.org"><span style="font-size:9pt;">¼ö½Å°ÅºÎ(Unsubscribe)</span></a></td>
</tr>
</table>
</body>
</html>


From noreply@sourceforge.net  Sat Mar 30 08:58:37 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 00:58:37 -0800
Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C"
Message-ID: <E16rEh3-0005NJ-00@usw-sf-web2.sourceforge.net>

Patches item #536908, was opened at 2002-03-29 22:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
>Summary: missing #include guards/extern "C"

Initial Comment:
cvs server: Diffing .
Index: cStringIO.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/cStringI
O.h,v
retrieving revision 2.15
diff -r2.15 cStringIO.h
2a3,5
> #ifdef __cplusplus
> extern "C" {
> #endif
130a134,136
> #ifdef __cplusplus
> }
> #endif
Index: descrobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/descrobj
ect.h,v
retrieving revision 2.8
diff -r2.8 descrobject.h
1a2,6
> #ifndef Py_DESCROBJECT_H
> #define Py_DESCROBJECT_H
> #ifdef __cplusplus
> extern "C" {
> #endif
80a86,88
> #ifdef __cplusplus
> }
> #endif
Index: iterobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/iterobje
ct.h,v
retrieving revision 1.3
diff -r1.3 iterobject.h
0a1,2
> #ifndef Py_ITEROBJECT_H
> #define Py_ITEROBJECT_H
1a4,6
> #ifdef __cplusplus
> extern "C" {
> #endif
13a19,22
> #ifdef __cplusplus
> }
> #endif
> #endif Py_ITEROBJECT_H


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-30 09:58

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, applied as

cStringIO.h 2.16
descrobject.h 2.9
iterobject.h 1.4

*Please* use context diffs in the future.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 23:35

Message:
Logged In: YES 
user_id=21627

Please attach the patch as a context (-c) or unified (-u)
diff to this report.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470


From noreply@sourceforge.net  Sat Mar 30 09:01:24 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 01:01:24 -0800
Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C"
Message-ID: <E16rEjk-0008Sp-00@usw-sf-web3.sourceforge.net>

Patches item #536908, was opened at 2002-03-29 22:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: David Abrahams (david_abrahams)
Assigned to: Nobody/Anonymous (nobody)
>Summary: missing #include guards/extern "C"

Initial Comment:
cvs server: Diffing .
Index: cStringIO.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/cStringI
O.h,v
retrieving revision 2.15
diff -r2.15 cStringIO.h
2a3,5
> #ifdef __cplusplus
> extern "C" {
> #endif
130a134,136
> #ifdef __cplusplus
> }
> #endif
Index: descrobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/descrobj
ect.h,v
retrieving revision 2.8
diff -r2.8 descrobject.h
1a2,6
> #ifndef Py_DESCROBJECT_H
> #define Py_DESCROBJECT_H
> #ifdef __cplusplus
> extern "C" {
> #endif
80a86,88
> #ifdef __cplusplus
> }
> #endif
Index: iterobject.h
======================================================
=============
RCS 
file: /cvsroot/python/python/dist/src/Include/iterobje
ct.h,v
retrieving revision 1.3
diff -r1.3 iterobject.h
0a1,2
> #ifndef Py_ITEROBJECT_H
> #define Py_ITEROBJECT_H
1a4,6
> #ifdef __cplusplus
> extern "C" {
> #endif
13a19,22
> #ifdef __cplusplus
> }
> #endif
> #endif Py_ITEROBJECT_H


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-30 09:58

Message:
Logged In: YES 
user_id=21627

Thanks for the patch, applied as

cStringIO.h 2.16
descrobject.h 2.9
iterobject.h 1.4

*Please* use context diffs in the future.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 23:35

Message:
Logged In: YES 
user_id=21627

Please attach the patch as a context (-c) or unified (-u)
diff to this report.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470


From noreply@sourceforge.net  Sat Mar 30 11:16:54 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 03:16:54 -0800
Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode
Message-ID: <E16rGqs-0006ll-00@usw-sf-web2.sourceforge.net>

Patches item #536241, was opened at 2002-03-28 14:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: A.M. Kuchling (akuchling)
Summary: string.zfill and unicode

Initial Comment:
This patch makes the function string.zfill work with 
unicode instances (and instances of str and unicode 
subclasses). Currently string.zfill(u"123", 10) 
results in "0000u'123'". With this patch the result is 
u'0000000123'.

Should zfill be made a real str und unicode method? I 
noticed that a zfill implementation is available in 
unicodeobject.c, but commented out.

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-30 12:16

Message:
Logged In: YES 
user_id=89016

But Python could be compiled without unicode support (by
undefining PY_USING_UNICODE), and string.zfill should work
even in this case.

What about making zfill a real str and unicode method?

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-29 17:24

Message:
Logged In: YES 
user_id=11375

Thanks for your patch!  I've checked it into CVS, with two 
modifications.  First, I removed the code to handle the case 
where Python doesn't have a unicode() built-in; there's no 
expection that 
you can take the standard library for Python version N and use 
it with version N-1, so this code isn't needed.

Second, I changed string.zfill() to take the str() and not the repr()
when it gets a non-string object because that seems to make 
more sense.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470


From noreply@sourceforge.net  Sat Mar 30 11:25:27 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 03:25:27 -0800
Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode
Message-ID: <E16rGz9-0001OH-00@usw-sf-web4.sourceforge.net>

Patches item #536241, was opened at 2002-03-28 13:26
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470

Category: Library (Lib)
Group: None
>Status: Open
Resolution: Fixed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: A.M. Kuchling (akuchling)
Summary: string.zfill and unicode

Initial Comment:
This patch makes the function string.zfill work with 
unicode instances (and instances of str and unicode 
subclasses). Currently string.zfill(u"123", 10) 
results in "0000u'123'". With this patch the result is 
u'0000000123'.

Should zfill be made a real str und unicode method? I 
noticed that a zfill implementation is available in 
unicodeobject.c, but commented out.

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-30 11:25

Message:
Logged In: YES 
user_id=6656

Hah, I was going to say that but was distracted by IE 
wiping out the machine I'm sitting at.

Re-opening.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-03-30 11:16

Message:
Logged In: YES 
user_id=89016

But Python could be compiled without unicode support (by
undefining PY_USING_UNICODE), and string.zfill should work
even in this case.

What about making zfill a real str and unicode method?

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2002-03-29 16:24

Message:
Logged In: YES 
user_id=11375

Thanks for your patch!  I've checked it into CVS, with two 
modifications.  First, I removed the code to handle the case 
where Python doesn't have a unicode() built-in; there's no 
expection that 
you can take the standard library for Python version N and use 
it with version N-1, so this code isn't needed.

Second, I changed string.zfill() to take the str() and not the repr()
when it gets a non-string object because that seems to make 
more sense.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470


From noreply@sourceforge.net  Sat Mar 30 11:27:10 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 03:27:10 -0800
Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals()
Message-ID: <E16rH0o-0001PP-00@usw-sf-web4.sourceforge.net>

Patches item #511219, was opened at 2002-01-31 14:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470

Category: Core (C code)
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Cesar Douady (douady)
Assigned to: Nobody/Anonymous (nobody)
Summary: suppress type restrictions on locals()

Initial Comment:
This patch suppresses the restriction that global and
local dictionaries do not access overloaded __getitem__
and __setitem__ if passed an object derived from class
dict.

An exception is made for the builtin insertion and
reference in the global dict to make sure this object
exists and to suppress the need for the derived class
to take care of this implementation dependent detail.

The behavior of eval and exec has been updated for code
objects which have the CO_NEWLOCALS flag set : if
explicitely passed a local dict, a new local dict is
not generated. This allows one to pass an explicit
local dict to the code object of a function (which
otherwise cannot be achieved). If this cannot be done
for backward compatibility problems, then an
alternative would consist in using the "new" module to
create a code object from a function with CO_NEWLOCALS
reset but it seems logical to me to use the information
explicitely provided.

Free and cell variables are not managed in this
version. If the patch is accepted, I am willing to
finish the job and implement free and cell variables,
but this requires a serious rework of the Cell object:
free variables should be accessed using the method of
the dict in which they relies and today, this dict is
not accessible from the Cell object.

Robustness : Currently, the plain test suite passes
(with a modification of test_desctut which precisely
verifies that the suppressed restriction is enforced).
 I have introduced a new test (test_subdict.py) which
verifies the new behavior.

Because of performance, the plain case (when the local
dict is a plain dict) is optimized so that differences
in performance are not measurable (within 1%) when run
on the test suite (i.e. I timed make test).


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-30 11:27

Message:
Logged In: YES 
user_id=6656

And there's precisely no way it's going into 2.2.x.

----------------------------------------------------------------------

Comment By: Cesar Douady (douady)
Date: 2002-03-30 00:08

Message:
Logged In: YES 
user_id=428521

to install this patch from python revision 2.2, follow these
steps :
- get the python.diff file from this page
- cd Python-2.2
- run "patch -p1 <wherever_you_put_the_file/python.diff"
- make clobber
- make recheck
- make
- make test
the reason you cannot just type make to rebuild everything
is that .h files are not checked when modules are
recompiled. Thus they are not, unless you remove them with
"make clobber". make recheck is to rebuild pyconfig.h.

----------------------------------------------------------------------

Comment By: Cesar Douady (douady)
Date: 2002-03-28 17:30

Message:
Logged In: YES 
user_id=428521

This patch has been generated from python version 2.2.

----------------------------------------------------------------------

Comment By: Cesar Douady (douady)
Date: 2002-03-19 10:57

Message:
Logged In: YES 
user_id=428521

Granted. Seems fair.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-18 08:59

Message:
Logged In: YES 
user_id=21627

This is quite a complex change. If you want to see it
integrated, I recommend that you find people that try it out
and report their experience here.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470


From noreply@sourceforge.net  Sat Mar 30 11:27:57 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 03:27:57 -0800
Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation
Message-ID: <E16rH1Z-0001Pl-00@usw-sf-web4.sourceforge.net>

Patches item #534304, was opened at 2002-03-24 13:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470

Category: Parser/Compiler
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: SUZUKI Hisao (suzuki_hisao)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 263 phase 2 Implementation

Initial Comment:
This is a sample implementation of PEP 263 phase 2.

This implementation behaves just as normal Python does
if no other coding hints are given.  Thus it does not
hurt anyone who uses Python now.  Note that it is
strictly compatible with the PEP in that every program
valid in the PEP is also valid in this implementation.

This implementation also accepts files in UTF-16 with
BOM.  They are read as UTF-8 internally.  Please try
"utf16sample.py" included.


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2002-03-30 11:27

Message:
Logged In: YES 
user_id=6656

Not going into 2.2.x.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-25 13:23

Message:
Logged In: YES 
user_id=21627

The patch looks good, but needs a number of improvements.

1. I have problems building this code. When trying to build
pgen, I get an error message of

Parser/parsetok.c: In function `parsetok':
Parser/parsetok.c:175: `encoding_decl' undeclared

The problem here is that graminit.h hasn't been built yet,
but parsetok refers to the symbol.

2. For some reason, error printing for incorrect encodings
does not work - it appears that it prints the wrong line in
the traceback.

3. The escape processing in Unicode literals is incorrect.
For example, u"\<non-ascii character>" should denote only
the non-ascii character. However, your implementation
replaces the non-ASCII character with \u<hex>, resulting in
\u<hex>, so the first backslash unescapes the second one.

4. I believe the escape processing in byte strings is also
incorrect for encodings that allow \ in the second byte.
Before processing escape characters, you convert back into
the source encoding. If this produces a backslash character,
escape processing will misinterpret that byte as an escape
character.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470


From noreply@sourceforge.net  Sun Mar 31 04:12:28 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 20:12:28 -0800
Subject: [Patches] [ python-Patches-536278 ] force gzip to open files with 'b'
Message-ID: <E16rWhg-0000Sz-00@usw-sf-web2.sourceforge.net>

Patches item #536278, was opened at 2002-03-28 09:53
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: force gzip to open files with 'b'

Initial Comment:
It doesn't make sense that the gzip module should
try to open a file in text mode.  The attached
patch forces a 'b' into the file open mode if it
wasn't given.  I also modified the test slightly to
try and tickle this code, but I can't test it very
effectively, because I don't do Windows... :-)


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-30 23:12

Message:
Logged In: YES 
user_id=31435

I suggest fixing this via changing the test to

if mode and 'b' not in mode:

Then mode=None and mode='' will be left alone (as Neal 
says, the code already does the right thing for those).


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-03-28 10:04

Message:
Logged In: YES 
user_id=33168

There is a problem (sorry, I have an evil mind). :-)

If '' is passed as the mode, before the patch, this would
have been converted to 'rb'.  After the patch, mode will
become 'b' and that will raise an exception:

>>> open('/dev/null', 'b')
IOError: [Errno 22] Invalid argument: b

If you add an (and mode) condition and that should be fine.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470


From noreply@sourceforge.net  Sun Mar 31 07:11:35 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sat, 30 Mar 2002 23:11:35 -0800
Subject: [Patches] [ python-Patches-536909 ] pymalloc for types and other cleanups
Message-ID: <E16rZV1-0005Hd-00@usw-sf-web3.sourceforge.net>

Patches item #536909, was opened at 2002-03-29 16:11
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neil Schemenauer (nascheme)
>Assigned to: Neil Schemenauer (nascheme)
Summary: pymalloc for types and other cleanups

Initial Comment:
This patch changes typeobject to use pymalloc for
managing the memory of subclassable types. It also
fixes a bug that caused an interpreter built without
GC to crash.

Testing this patch was a bitch.  There are three knobs
related to MM now (with-cycle-gc, with-pymalloc,
and PYMALLOC_DEBUG).  I think I found different bugs
when testing with each possible combination.

There's one bit of ugliness in this patch.  Extension
module writers have to use _PyMalloc_Del to initialize
the tp_free pointer.  There should be a "public"
function for that.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-31 02:11

Message:
Logged In: YES 
user_id=31435

Neil, I appreciate the work!  I'm afraid I screwed you at 
the same time.  How do you want to proceed?  I think "the 
plan" now is that we go back to the PyObject_XXX interface, 
and when pymalloc is enabled map most flavors of "free 
memory" ({Py{Mem, Object}_{Del, DEL, Free, FREE}) to the 
pymalloc free.  You're not required <wink> to work on this, 
but if you've got some spare energy I could sure use the 
help.

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-29 18:09

Message:
Logged In: YES 
user_id=35752

I'm counting on Tim to finish PyMem_NukeIt.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 17:47

Message:
Logged In: YES 
user_id=21627

I see another memory allocation family here: What function
should objects allocated through PyType_GenericAlloc be
released with?

If you change the behaviour of PyType_GenericAlloc, all
types in extensions written for 2.2 that use
PyType_GenericAlloc will break, since they will still have
PyObject_Del in their tp_free slot.

I believe "families" should always be complete, so along
with PyType_GenericAlloc goes PyType_GenericFree.

If you want it fully backwards compatible, you need to
introduce PyType_PyMallocAlloc...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470


From goldinf_obases3728_1y@yahoo.com  Sun Mar 31 10:12:52 2002
From: goldinf_obases3728_1y@yahoo.com (GOLD INFO)
Date: Sun, 31 Mar 2002 14:12:52 +0400
Subject: [Patches] =?windows-1251?Q?=CB=E5=E3=EA=EE=F1=F2=FC_=EF=F0=E8=E2=EB=E5=F7=E5=ED=E8=FF_=ED=EE=E2=FB=F5_=EA=EB=E8=E5=ED=F2=EE=E2?=
Message-ID: <286362002303110125295@yahoo.com>

------=_NextPart_84815C5ABAF209EF376268C8
Content-type: text/plain; charset="windows-1251"


------=_NextPart_84815C5ABAF209EF376268C8
Content-Type: text/html; charset="windows-1251"

<!-- saved from url=(0022)http://internet.e-mail -->
<html><head><META content="text/html; charset=windows-1251" http-equiv=Content-Type></head><body bgcolor="#E6E4FC"><div align="justify"><b>"GOLD INFO"</b> ñâèäåòåëüñòâóåò âàì ñâîå ïî÷òåíèå.<br>Íàêîíåö-òî âû ìîæåòå ïîëó÷èòü òî, ÷òî âàì òàê íå õâàòàåò <b>äëÿ ïðèâëå÷åíèÿ íîâûõ êëèåíòîâ</b>!<br><br><a href="http://www.help-marketing.net/softcompl.html">Áàçû äëÿ êîíòàêòîâ ïî ýëåêòðîííîé ïî÷òå</a> (ñàìûé äåøåâûé ñïîñîá áèçíåñ-êîíòàêòîâ). Âñå ïðåäïðèÿòèÿ â ýòèõ áàçàõ èìåþò ýëåêòðîííûé àäðåñ:<br>57.000 ïðåäïðèÿòèé Ìîñêâû<br>48.000 ïðåäïðèÿòèé Ðîññèè (áåç Ìîñêâû)<br>25.000 ïðåäïðèÿòèé áëèæíåãî çàðóáåæüÿ<br><br>Åñëè Âû õîòèòå íàéòè ïîòðåáèòåëåé â íîâîì äëÿ âàñ ðåãèîíå Ðîññèè èëè ÑÍÃ, òî íà÷àòü ñòîèò ñ ïðèîáðåòåíèÿ <a href="http://www.help-marketing.net/region.html">áàçû äàííûõ ïðåäïðèÿòèé ýòîãî ðåãèîíà.</a><br><br>Âîçìîæíî, âàøè êîììåð÷åñêèå ïðåäëîæåíèÿ èìååò ñìûñë äåëàòü <a href="http://www.help-marketing.net/51.html">òîëüêî ñàìûì êðóïíûì êîìïàíèÿì.</a> Ìû ïîäãîòîâèëè äëÿ âàñ áàçû äàííûõ òîëüêî òåõ ïðåäïðèÿòèé Ìîñêâû, Ðîññèè è ÑÍÃ, ÷èñëåííîñòü øòàòà êîòîðûõ: áîëåå 50 ÷åëîâåê, áîëåå 250, 500, 1000 ÷åëîâåê.<br><br>Ðàáîòàåòå íà ìîñêîâñêîì ðûíêå èëè ñîáèðàåòåñü íà íåãî âûõîäèòü? Ó íàñ åñòü ìàêñèìàëüíî <a href="http://www.help-marketing.net/moscow.html">ïîëíàÿ èíôîðìàöèÿ î ïðåäïðèÿòèÿõ - ó÷àñòíèêàõ ðûíêîâ ëþáûõ âèäîâ äåÿòåëüíîñòè â Ìîñêâå.</a><br><br>Åñëè âû ïðîäàåòå ñâîè óñëóãè ïîñðåäñòâîì òåëåôîííîãî ìàðêåòèíãà â Ìîñêâå, <a href="http://www.help-marketing.net/phone.html">áàçà "Òåëåìàðêåòèíã (Ìîñêâà)"</a> - ñäåëàåò âàøó ðàáîòó íàìíîãî ýôôåêòèâíåå. Â áàçå - 150 òûñ. ìîñêîâñêèõ ïðåäïðèÿòèé, 270 òûñ. íîìåðîâ òåëåôîíîâ!<br><br><a href="http://www.help-marketing.net/elsend.html">Ýëåêòðîííûå ðàññûëêè</a> - ñàìûé ýôôåêòèâíûé âèä ðåêëàìû. Çàêàæèòå ðàññûëêó âàøåé èíôîðìàöèè ïî ïðåäïðèÿòèÿì Ìîñêâû, Ðîññèè èëè ÑÍÃ. Âàñ çàâàëÿò çàêàçàìè! Íå âåðèòå? Îçíàêîìüòåñü ñ <a href="http://www.help-marketing.net/zakon.html">çàêîíîì ýôôåêòèâíîñòè ýëåêòðîííûõ ðàññûëîê</a> è ïðî÷òèòå, <a href="http://www.help-marketing.net/smielektron.html">÷òî ïèøóò îá ýòîì ÑÌÈ.</a><br><br>Åñëè Âû ðåøèëè ñäåëàòü âåá-ñàéò, îáðàùàéòåñü â <a href="http://www.help-marketing.net/web.html">íàøó ñòóäèþ âåá-ìàñòåðèíãà.</a></div><br>Ïîäðîáíîñòè íà íàøåì ñàéòå: <a href="http://www.help-marketing.net">www.help-marketing.net</a><br>Çàÿâêè è âîïðîñû ïðèñûëàéòå íà E-mail: <a href="mailto:sales@help-marketing.net">sales@help-marketing.net</a><br><br>Æåëàåì âàì óäà÷è!<br>GOLD INFO. Ìû äåëàåì âàø áèçíåñ óñïåøíûì.<br></body></html>

------=_NextPart_84815C5ABAF209EF376268C8--


From noreply@sourceforge.net  Sun Mar 31 16:16:08 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 31 Mar 2002 08:16:08 -0800
Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation
Message-ID: <E16ri00-00043p-00@usw-sf-web4.sourceforge.net>

Patches item #534304, was opened at 2002-03-24 22:52
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: SUZUKI Hisao (suzuki_hisao)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 263 phase 2 Implementation

Initial Comment:
This is a sample implementation of PEP 263 phase 2.

This implementation behaves just as normal Python does
if no other coding hints are given.  Thus it does not
hurt anyone who uses Python now.  Note that it is
strictly compatible with the PEP in that every program
valid in the PEP is also valid in this implementation.

This implementation also accepts files in UTF-16 with
BOM.  They are read as UTF-8 internally.  Please try
"utf16sample.py" included.


----------------------------------------------------------------------

>Comment By: SUZUKI Hisao (suzuki_hisao)
Date: 2002-04-01 01:16

Message:
Logged In: YES 
user_id=495142

Thank you for your review.
Now 1. and 3. are fixed, and 2. is improved.
(4. is not true.)


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-30 20:27

Message:
Logged In: YES 
user_id=6656

Not going into 2.2.x.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-25 22:23

Message:
Logged In: YES 
user_id=21627

The patch looks good, but needs a number of improvements.

1. I have problems building this code. When trying to build
pgen, I get an error message of

Parser/parsetok.c: In function `parsetok':
Parser/parsetok.c:175: `encoding_decl' undeclared

The problem here is that graminit.h hasn't been built yet,
but parsetok refers to the symbol.

2. For some reason, error printing for incorrect encodings
does not work - it appears that it prints the wrong line in
the traceback.

3. The escape processing in Unicode literals is incorrect.
For example, u"\<non-ascii character>" should denote only
the non-ascii character. However, your implementation
replaces the non-ASCII character with \u<hex>, resulting in
\u<hex>, so the first backslash unescapes the second one.

4. I believe the escape processing in byte strings is also
incorrect for encodings that allow \ in the second byte.
Before processing escape characters, you convert back into
the source encoding. If this produces a backslash character,
escape processing will misinterpret that byte as an escape
character.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470


From noreply@sourceforge.net  Sun Mar 31 21:10:30 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 31 Mar 2002 13:10:30 -0800
Subject: [Patches] [ python-Patches-452110 ] socketmodule ssl: server &amp; thread
Message-ID: <E16rmas-0007DL-00@usw-sf-web3.sourceforge.net>

Patches item #452110, was opened at 2001-08-17 08:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452110&group_id=5470

Category: Library (Lib)
Group: None
>Status: Deleted
Resolution: None
Priority: 5
Submitted By: Jozef Hatala (jhatala)
Assigned to: Jeremy Hylton (jhylton)
Summary: socketmodule ssl: server &amp; thread

Initial Comment:
Simple enhancement to the SSL support in module socket
:
- support for writing SSL servers (as well as clients)
- Py_*_ALLOW_THREADS arround blocking calls to openssl
- rsa temp key to work with older export netscape
- renamed attribute server to peer

This patch allows for powerfull application servers
like the following one to be accessed with "netscape
https://localhost:1443/"

from socket import *
p=socket(AF_INET,SOCK_STREAM)
p.bind(('localhost',1443))
p.listen(1)
while 1 :
        s,a = p.accept()
        c = sslserver(s,'server.key','server.crt')
        print "They said:", c.read()
        c.write('HTTP/1.0 200 OK\r\n')
        c.write('Content-Type: text/plain\r\n\r\n** Hi!
**')
        c.close()

TODO: a kind of makefile() on the ssl object like on a
socket would be welcome.

Have fun,

jh

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-10-22 06:51

Message:
Logged In: YES 
user_id=163326

I don't think it is a good idea to add this. Python's
builtin client-side SSL support is already pretty weak. This
patch would add a minimal SSL server implementation, but it
shares some of the same weaknesses, like missing the ability
to set the SSL method (version 2, version 3, version 2 or
3). I'd recommend not adding any more SSL features at this
point, but for Python 2.2 only keeping the existing
client-side functionality and fixing any remaining bugs there.

I'm working on something that would hopefully be better in
the longrun: an SSL API that the various Python SSL modules
(m2crypto, POW, pyOpenSSL) can implement and Python will
then use one of these third-party modules for https,
smtp/tls etc. Sort of a plugin ability for an SSL module.
If you add stuff to the broken SSL API now, you'll either
have to carry it around for a long time or, if my proposal
get implemented and accepted, the workarounds will be clunkier.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 19:10

Message:
Logged In: YES 
user_id=6380

Time to look at this again?

----------------------------------------------------------------------

Comment By: Jozef Hatala (jhatala)
Date: 2001-10-17 07:43

Message:
Logged In: YES 
user_id=300564

This patch now against Python 2.2a3 contains:
SSL server support (SSL_accept) [as before]
additionally:
allow threads around getaddrinfo &Co.
more verbose exc messages (for failures in ssl() and sslserver())
methods recv and send on ssl object as equivalents of read and write.
methods makefile on ssl object (a look-alike and does no dup!)
a client/server test (depends on os.fork())


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-10-16 09:05

Message:
Logged In: YES 
user_id=31392

If you can provide test cases, I'll provide documentation. 
But hurry, if we don't get this done this week, we may miss
Python 2.2.


----------------------------------------------------------------------

Comment By: Jozef Hatala (jhatala)
Date: 2001-10-16 03:21

Message:
Logged In: YES 
user_id=300564

I'll submit a simple test with certificates and an enhanced
patch for 2.2a2 (does not patch cleanly any more) soon (this
week) [time and inet access issues].
I haven't written any doc.  There was none for ssl.  I know
that is no excuse...
Does some-one volonotere?

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2001-10-11 09:13

Message:
Logged In: YES 
user_id=31392

Jozef-- are you going to contribute tests and documentation?


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-08-18 23:17

Message:
Logged In: YES 
user_id=6380

Nice, but where's the documentation? (Thanks for the
docstrings though!) And the test suite?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452110&group_id=5470


From noreply@sourceforge.net  Sun Mar 31 23:12:23 2002
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Sun, 31 Mar 2002 15:12:23 -0800
Subject: [Patches] [ python-Patches-537536 ] bug 535444 super() broken w/classmethods
Message-ID: <E16roUp-0008RO-00@usw-sf-web4.sourceforge.net>

Patches item #537536, was opened at 2002-03-31 23:12
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=537536&group_id=5470

Category: Core (C code)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Phillip J. Eby (pje)
Assigned to: Nobody/Anonymous (nobody)
Summary: bug 535444 super() broken w/classmethods

Initial Comment:
This patch fixes bug #535444.  It is against the
current CVS version of Python, and addresses the
problem by adding a 'starttype' variable to
'super_getattro', which works the same as 'starttype'
in the pure-Python version of super in the descriptor
tutorial.  This variable is then passed to the
descriptor __get__ function, ala
'descriptor.__get__(self.__obj__,starttype)'.

This patch does not correct the pure-Python version of
'super' in the descriptor tutorial; I don't know where
that file is or how to submit a patch for it.

This patch also does not include a regression test for
the bug.  I do not know what would be considered the
appropriate test script to place this in.

Thanks.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=537536&group_id=5470