From arigo at tunes.org  Sat Nov  1 07:46:28 2003
From: arigo at tunes.org (Armin Rigo)
Date: Sat Nov  1 07:50:30 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <2mad7h72sr.fsf@starship.python.net>
References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
Message-ID: <20031101124628.GA26463@vicky.ecs.soton.ac.uk>

Hello Michael,

On Fri, Oct 31, 2003 at 05:08:36PM +0000, Michael Hudson wrote:
> > be getting a 12-25% decrease in memory use for the base object,
> > though.
> 
> More than that in the good cases.  Something I forgot was that you'd
> probably have to knock variable length types on the head.

Why?


Armin


From nas-python at python.ca  Sat Nov  1 22:36:57 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Sat Nov  1 22:35:25 2003
Subject: [Python-Dev] Deprecate the buffer object?
In-Reply-To: <0b8e01c39f34$1d31e1f0$0500a8c0@eden>
References: <200310300230.h9U2UId08398@oma.cosc.canterbury.ac.nz>
	<0b8e01c39f34$1d31e1f0$0500a8c0@eden>
Message-ID: <20031102033657.GA8137@mems-exchange.org>

On Fri, Oct 31, 2003 at 09:21:06AM +1100, Mark Hammond wrote:
> Thus, my preference is to fix the buffer object by fixing the interface as
> much as possible.
> 
> Here is a sketch of a solution, incorporating both Neil and Greg's ideas:
> 
> * Type object gets a new flag - TP_HAS_BUFFER_INFO, corresponding to a new
> 'getbufferinfoproc' slot in the PyBufferProcs structure (note - a function
> pointer, not static flags as Neil suggested)
> 
> * New function 'getbufferinfoproc' returns a bitmask - Py_BUFFER_FIXED is
> one (and currently the only) flag that can be returned.

What does this flag mean?  To my mind, there are several different
types of memory buffers and the buffer interface does not
distinguish between all of them.  Is the size and position of the
buffer fixed?  Is the buffer immutable (it may be readonly by the
buffer object but writable via some other mechanism)?

The first question can be avoided by using Greg's idea of always
refreshing the size and position.  The second question cannot be
answered using the current interface.  I supposed if the buffer is
immutable then it is implied that the its size and position is
fixed.

> * New buffer functions PyObject_AsFixedCharBuffer, etc.  These check the new
> flag (and a type lacking TP_HAS_BUFFER_INFO is assumed to *not* be fixed)
> 
> * Buffer object keeps a reference to the existing object (as it does now).
> Its getbufferinfoproc delegates to the underlying object.
> 
> * Buffer object *never* keeps a pointer to the buffer - only to the object.
> Functions like tp_hash always re-fetch the buffer on demand.  The buffer
> returned by the buffer object is then guaranteed to be as reliable as the
> underlying object.  (This may be a semantic issue with hash(), but
> conceptually seems fine.  Potential solution here - add Py_BUFFER_READONLY
> as a buffer flag, then hash() semantics could do the right thing)

You can't use the base objects hash if the buffer has a explicit size
of offset.

  Neil

From nas-python at python.ca  Sat Nov  1 22:49:24 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Sat Nov  1 22:47:47 2003
Subject: [Python-Dev] Deprecate the buffer object?
In-Reply-To: <200310300230.h9U2UId08398@oma.cosc.canterbury.ac.nz>
References: <087001c39e73$70333e60$0500a8c0@eden>
	<200310300230.h9U2UId08398@oma.cosc.canterbury.ac.nz>
Message-ID: <20031102034924.GB8137@mems-exchange.org>

On Thu, Oct 30, 2003 at 03:30:18PM +1300, Greg Ewing wrote:
> That's completely different from what I had in mind, which was:
> 
> (1) Keep a reference to the base object in the buffer object, and
> 
> (2) Use the buffer API to fetch a fresh pointer from the
>     base object each time it's needed.

I've just uploaded a (rough) patch that implements your idea.

    http://www.python.org/sf/832058

  Neil

From greg at electricrain.com  Sun Nov  2 01:20:50 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Sun Nov  2 01:20:56 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <3FA06DC5.70407@ocf.berkeley.edu>
References: <338366A6D2E2CA4C9DAEAE652E12A1DED6B3F8@au3010avexu1.global.avaya.com>
	<3FA06DC5.70407@ocf.berkeley.edu>
Message-ID: <20031102062050.GA5805@zot.electricrain.com>

> >>How about re-engineering the interpreter to make it more MP
> >>friendly? (This is probably a bigger task than a Masters thesis.)
> >>The current interpreter serializes on the global interpreter lock
> >>(GIL) and blocks everything.
...
> I will still consider this, though.
> 
> -Brett

If you take this on there is no doubt you'll receive many-a-beer from
people on this list!  :)


From greg at electricrain.com  Sun Nov  2 04:25:17 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Sun Nov  2 04:25:22 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
Message-ID: <20031102092517.GB5805@zot.electricrain.com>

I just committed the fixes necessary for test_bsddb.py to complete
without deadlocking (yay!).  It should remove all possibility of a bsddb
deadlock in single threaded applications as well as allow for multiple
iterator/generator objects to operate properly on a database at once
and make the _DBWithCursor __iter__ implementation more efficient by
not asking for the values from the db since it only returns the keys.

I believe there are still race conditions that could lead to a deadlock in
the bsddb interface due to a current lack of locking around its internal
open|closed DBCursor management.  I'm opening a SF bug to track that.
A test case to prove the theory is needed.

Let me know if you see any problems.

I'm sorry about allowing the deadlock to be committed in the first place.
I routinely run the large bsddb test suite when doing bsddb development
but test_bsddb.py contained additional coverage for the recent iterator
interface that is not present in the large test suite; now I run both.

- Greg


From greg at electricrain.com  Sun Nov  2 05:00:06 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Sun Nov  2 05:00:12 2003
Subject: [Python-Dev] Re: test_bsddb blocks testing popitem - reason
In-Reply-To: <200310281112.21162.aleaxit@yahoo.com>
References: <200310251232.55044.aleaxit@yahoo.com>
	<200310271125.16879.aleaxit@yahoo.com>
	<20031027215648.GM3929@zot.electricrain.com>
	<200310281112.21162.aleaxit@yahoo.com>
Message-ID: <20031102100006.GA17328@zot.electricrain.com>

On Tue, Oct 28, 2003 at 11:12:21AM +0100, Alex Martelli wrote:
> On Monday 27 October 2003 10:56 pm, Gregory P. Smith wrote:
> > What about the behaviour of multiple iterators for the same dict being
> > used at once (either interleaved or by multiple threads; it shouldn't
> > matter)?  I expect that works fine in python.
> 
> If the dict is not being modified, or if the only modifications on it are
> assigning different values for already-existing keys, multiple iterators
> on the same unchanging dict do work fine in one or more threads.
> But note that iterators only "read" the dict, don't change it.  If any
> change to the set of keys in the dict happens, all bets are off.
...
> > This is something the _DBWithCursor iteration interface does not currently
> > support due to its use of a single DBCursor internally.
> >
> > _DBWithCursor is currently written such that the cursor is never closed
> > once created.  This leaves tons of potential for deadlock even in single
> > threaded apps.  Reworking _DBWithCursor into a _DBThatUsesCursorsSafely
> > such that each iterator creates its own cursor in an internal pool
> > and other non cursor methods that would write to the db destroy all
> > cursors after saving their current() position so that the iterators can
> > reopen+reposition them is a solution.
> 
> Woof.  I think I understand what you're saying.  However, writing to a
> dict (in the sense of changing the sets of keys) while one is iterating
> on the dict is NOT supported in Python -- basically "undefined behavior"
> (which does NOT include possibilities of crashes and deadlocks, though).
> So, maybe, we could get away with something a bit less rich here?

I just implemented and committed something about that rich.

I believe I could simplify it: have __iter__() and iteritems() return if
their cursor was closed out from underneath them instead of the current
attempt to reopen a cursor, reposition themselves, and keep going [which
could still have unpredictable results since a db modification could
rearrange the keys in some types of databases].

> So, maybe I _should_ just fix popitem that way and see if all tests pass?
> I dunno -- it feels a bit like fixing the symptoms and leaving some deep
> underlying problems intact...

My commit fixed the deadlock problem for the single threaded case and
wrote a test case to prove it.  I opened a SF bug to track fixing the
deadlock possibilities in the multithreaded case (and a memory leak i
believe i added).

-g


From aleaxit at yahoo.com  Sun Nov  2 06:11:33 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sun Nov  2 06:11:40 2003
Subject: [Python-Dev] Re: test_bsddb blocks testing popitem - reason
In-Reply-To: <20031102100006.GA17328@zot.electricrain.com>
References: <200310251232.55044.aleaxit@yahoo.com>
	<200310281112.21162.aleaxit@yahoo.com>
	<20031102100006.GA17328@zot.electricrain.com>
Message-ID: <200311021211.33462.aleaxit@yahoo.com>

On Sunday 02 November 2003 11:00 am, Gregory P. Smith wrote:
   ...
> > So, maybe, we could get away with something a bit less rich here?
>
> I just implemented and committed something about that rich.

Super!  I've just updated, built, and re-run all tests (on 2.4), and they
all go smoothly.

> My commit fixed the deadlock problem for the single threaded case and
> wrote a test case to prove it.  I opened a SF bug to track fixing the
> deadlock possibilities in the multithreaded case (and a memory leak i
> believe i added).

OK, I understand this fix isn't the be-all end-all, but still, it makes things
much better than they were before.  *THANKS*!


Alex


From skip at manatee.mojam.com  Sun Nov  2 08:00:59 2003
From: skip at manatee.mojam.com (Skip Montanaro)
Date: Sun Nov  2 08:01:09 2003
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200311021300.hA2D0x8Y004595@manatee.mojam.com>


Bug/Patch Summary
-----------------

548 open / 4296 total bugs (+48)
190 open / 2438 total patches (-1)

New Bugs
--------

python 2.3.2 make test segfault (2003-10-26)
	http://python.org/sf/830573
httplib.HTTPConnection._send_request header parsing bug (2003-10-27)
	http://python.org/sf/831271
Solaris term.h needs curses.h (2003-10-27)
	http://python.org/sf/831574
httplib hardcodes Accept-Encoding (2003-10-28)
	http://python.org/sf/831747
Docstring for pyclbr.readmodule() is incorrect (2003-10-28)
	http://python.org/sf/831969
C++ extensions using SWIG and MinGW (2003-10-28)
	http://python.org/sf/832159
Build fails in ossaudiodev.c with missing macros (2003-10-29)
	http://python.org/sf/832236
Wrong reference for specific minidom methods (2003-10-29)
	http://python.org/sf/832251
Bad Security Advice in CGI Documentation (2003-10-29)
	http://python.org/sf/832515
Inconsitent line numbering in traceback (2003-10-29)
	http://python.org/sf/832535
Please link modules with shared lib (2003-10-29)
	http://python.org/sf/832799
urllib.urlencode doesn't work for output from cgi.parse_qs (2003-10-30)
	http://python.org/sf/833405
Incorrect priority 'in' and '==' (2003-10-31)
	http://python.org/sf/833905
Ctrl+key combos stop working in IDLE (2003-10-31)
	http://python.org/sf/833957
Mouse wheel crashes program (2003-11-01)
	http://python.org/sf/834351
python and lithuanian locales (2003-11-02)
	http://python.org/sf/834452
simple bsddb interface potential for deadlock with threads (2003-11-02)
	http://python.org/sf/834461

New Patches
-----------

deprecate or fix buffer object (2003-10-28)
	http://python.org/sf/832058
Implementation PEP 322: Reverse Iteration (2003-11-01)
	http://python.org/sf/834422

Closed Bugs
-----------

test_signal hangs -- signal broken on OpenBSD? (2002-04-26)
	http://python.org/sf/549081
urllib2 and proxy (2003-01-02)
	http://python.org/sf/661042
os.popen with mode "rb" fails on Unix (2003-03-13)
	http://python.org/sf/703198
Test failures on Linux, Python 2.3b1 tarball (2003-04-26)
	http://python.org/sf/728051
Memory leak on open() only in 2.3? (2003-08-15)
	http://python.org/sf/789402
urllib.urlopen for https doesn't always provide readlines (2003-08-20)
	http://python.org/sf/792101
gc.get_referrers() is inherently dangerous (2003-08-23)
	http://python.org/sf/793822
dis.disassemble_string() broken (2003-09-23)
	http://python.org/sf/811294
int ("ffffffd3", 16) gives error (2003-09-24)
	http://python.org/sf/811898
Email.message example missing arg (2003-10-03)
	http://python.org/sf/817178
httplib.SSLFile lacks readlines() method (2003-10-07)
	http://python.org/sf/819510
Package Manager Scrolling Behavior (2003-10-15)
	http://python.org/sf/824430
dict.__init__ doesn't call subclass's __setitem__. (2003-10-16)
	http://python.org/sf/824854
wrong error message of islice indexing (2003-10-20)
	http://python.org/sf/827190
ctime is not creation time (2003-10-21)
	http://python.org/sf/827902
setattr(obj, BADNAME, value) does not raises exception (2003-10-24)
	http://python.org/sf/829458
python-mode.el: py-b-of-def-or-class looks inside strings (2003-10-25)
	http://python.org/sf/830347

Closed Patches
--------------

Add isxxx() methods to string objects (2002-05-30)
	http://python.org/sf/562501
Enhanced file constructor (2002-09-11)
	http://python.org/sf/608182
Experimental Inno Setup Win32 installer (2002-10-24)
	http://python.org/sf/628301
terminal type option subnegotiation in telnetlib (2003-04-17)
	http://python.org/sf/723364
Allows os.forkpty to work on more platforms (Solaris!) (2003-05-04)
	http://python.org/sf/732401
fix problem in about dialog (2003-07-21)
	http://python.org/sf/775057
pydoc's usage should use basename (2003-08-08)
	http://python.org/sf/785689
termios module on IRIX (2003-08-11)
	http://python.org/sf/787189
ignore "b" and "t" mode modifiers in posix_popen (2003-08-13)
	http://python.org/sf/788404
POP3 over SSL support for poplib (2003-08-19)
	http://python.org/sf/791706
[_ssl.c] SSL_write() called with -1 as size (2003-09-10)
	http://python.org/sf/803998
socket.ssl should check certificates (2003-09-22)
	http://python.org/sf/810754
sprout more file operations in SSLFile, fixes 792101 (2003-10-04)
	http://python.org/sf/817854
let's get rid of cyclic object comparison (2003-10-17)
	http://python.org/sf/825639
Add list.copysort() (2003-10-17)
	http://python.org/sf/825814
itertoolsmodule.c: islice error messages (827190) (2003-10-25)
	http://python.org/sf/830070
python-mode.el: (py-point 'bod) doesn't quite work (2003-10-25)
	http://python.org/sf/830341

From martin at v.loewis.de  Sun Nov  2 14:05:55 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sun Nov  2 14:06:11 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <20031101124628.GA26463@vicky.ecs.soton.ac.uk>
References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
Message-ID: <m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>

Armin Rigo <arigo@tunes.org> writes:

> On Fri, Oct 31, 2003 at 05:08:36PM +0000, Michael Hudson wrote:
> > > be getting a 12-25% decrease in memory use for the base object,
> > > though.
> > 
> > More than that in the good cases.  Something I forgot was that you'd
> > probably have to knock variable length types on the head.
> 
> Why?

Assuming "to knock on the head" means "to put an end to":

If you put all objects of the same type into a pool, you really want
all objects to have the same side, inside a pool. With that
assumption, garbage objects can be reallocated without causing
fragmentation. If objects in a pool have different sizes, it is not
possible to have an efficient reallocation strategy.

Of course, you could try to make a compacting garbage collector, but
that would break the current programming model even more (as object
references would stop being pointers).

Regards,
Martin

From martin at v.loewis.de  Sun Nov  2 14:10:33 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sun Nov  2 14:10:47 2003
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
In-Reply-To: <200311021300.hA2D0x8Y004595@manatee.mojam.com>
References: <200311021300.hA2D0x8Y004595@manatee.mojam.com>
Message-ID: <m3y8uyio2e.fsf@mira.informatik.hu-berlin.de>

Skip Montanaro <skip@manatee.mojam.com> writes:

> Bug/Patch Summary
> -----------------
> 
> 548 open / 4296 total bugs (+48)
> 190 open / 2438 total patches (-1)

How do you compute the deltas? On Oct 26, in

http://mail.python.org/pipermail/python-dev/2003-October/039559.html

you write

547 open / 4276 total bugs (+42)
205 open / 2432 total patches (+7)

Regards,
Martin

From aleaxit at yahoo.com  Sun Nov  2 17:19:42 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sun Nov  2 17:19:51 2003
Subject: [Python-Dev] reflections on basestring -- and other abstract
	basetypes
Message-ID: <200311022319.42725.aleaxit@yahoo.com>

1. Shouldn't class UserString.UserString inherit from basestring?  After all,
    basestring exists specifically in order to encourage typetests of the form
    isinstance(x, basestring) -- wouldn't it be better if such tests could
    also catch "user-tweaked strings" derived from UserString ... ?

2. If we do want to encourage such typetest idioms, it might be a good idea
   to provide some other such abstract basetypes for the purpose.

   For example, I see quite a few cases of isinstance(x, (int,long,gmpy.mpz))
   in my code -- and that, despite the fact that I'm not enamoured of
   typetesting as a general idea and that I'm quite aware that this kind of
   check could miss some other kind of user-coded "integeroid number".

   If there was an abstract basetype, say "baseinteger", from which int and
   long derived, I'd be happy to tweak gmpy to make mpz subclass it (in 2.4
   and later versions of Python only, of course) and allow such typetests to
   happen more smoothly, faster and with more generality too.

3. And perhaps baseinteger (and float and complex) should all subclass yet
    another basetype, say "basenumber"?  Why not?  I admit that right now
    I have no use cases where I _do_ want to accept complex numbers as
    well as int, long, float, and gmpy thingies (so, maybe there should be a
    more specific "basereal" keeping complex out...?), but apart from this
    detail such an abstract basetype would be similarly useful (in practice
    I would use it since I do not expect complex in my apps anyway).

4. Furthermore, providing "basenumber" would let user-coded classes "flag"
    in a simple and direct way "I'm emulating numbers".  This might well be
    useful _to Python itself_...
    Right now, I'm stuck for an answer to the bug that a user-coded class
    which exposes __mul__ but not __rmul__ happens to support its instances
    being multiplied by an integer on the right -- quite surprising to users!
    The problem is that this behavior is apparently expected, though not
    documented, when the user-coded class is trying to simulate a _sequence_
    rather than a number.  So, I can't just take the peculiar "accidental
    commutativity with integers only" away.
    IF a user class could flag itself as "numeroid" by inheriting basenumber,
    THEN the "accidental commutativity" COULD be easily removed at least
    for such classes.

5. in fact, now that we fill in type descriptor slots bases on user-coded
    classes' special methods, I suspect this isn't the only such issue.  While
    "flagging" (inheriting one of the abstract basetypes) would be entirely
    optional for user-coded classes, it would at least provide a way to
    _explicitly disambiguate_ what it is that the user-coded class IS trying
    to emulate, if the user wants to.

6. of course, for that to be any use, the various basetypes should not be
    "ambiguously" multiply inheritable from.  Right now, is isnt so...:

>>> class x(basestring, int): pass
...
>>> isinstance(x(), int)
True
>>> isinstance(x(), basestring)
True

...does anybody see any problem if, in 2.4, we take away the ability to
multiply inherit from basestring AND also from another builtin type which 
does not in turn inherit from basestring...?  I have the impression that
right now this is working "sort of accidentally", rather than by design.

7. one might of course think of other perhaps-useful abstract basetypes,
    such as e.g. basesequence or basemapping -- right now the new
    forthcoming built-in 'reverse' is trying to avoid "accidentally working"
    on mappings by featuretesting for (e.g.) has_key, but if the user
    could optionally subclass either of these abstract basetypes (but not
    both at once, see [6]:-), that might ease reverse's task in some cases.

    Why, such abstract basetypes might even make operator.isMappingType
    useful again -- right now, of course:
>>> operator.isMappingType([])
True
    and therefore there isn't much point in that function:-).

    But I think that points 1-6 may be enough to discuss for the moment
    (and I brace myself for the flames of the antitypetesters -- why, if I
    hadn't matured this idea myself I might well be one of the flamers:-)
    so I have no concrete proposals sub [7] -- yet.

<donning suit="asbestos">
   ...just a sec...
</donning>

Ok, ready -- fire away!


Alex


From python at rcn.com  Sun Nov  2 17:52:26 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sun Nov  2 17:53:28 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <200311022319.42725.aleaxit@yahoo.com>
Message-ID: <002601c3a193$fcad8300$e841fea9@oemcomputer>

> 1. Shouldn't class UserString.UserString inherit from basestring?  

The functionality of UserString has been subsumed by inheriting from
str.  So, its main purpose now is to keep old code working which means
that it is probably not wise to suddenly convert it from a classic class
to a new-style class.


> 3. And perhaps baseinteger (and float and complex) should all subclass
yet
>     another basetype, say "basenumber"?  Why not?  I admit that right
now
>     I have no use cases where I _do_ want to accept complex numbers as
>     well as int, long, float, and gmpy thingies (so, maybe there
should be
> a
>     more specific "basereal" keeping complex out...?), but apart from
this
>     detail such an abstract basetype would be similarly useful (in
> practice
>     I would use it since I do not expect complex in my apps anyway).

At one time, I also requested an abstract numeric inheritance hierarchy
with real=union(int,float,long) and numbers=union(real,complex).
However, much time has passed and the need has never risen again.


> ...does anybody see any problem if, in 2.4, we take away the ability
to
> multiply inherit from basestring AND also from another builtin type
which
> does not in turn inherit from basestring.

I would rather leave this open than introduce code to prevent it. My
sense is that blocking it would introduce complexity in coding,
documentation, understanding, and debugging while offering near zero
payoff.


> right now the new
>     forthcoming built-in 'reverse' is trying to avoid "accidentally
> working"
>     on mappings by featuretesting for (e.g.) has_key, but if the user
>     could optionally subclass either of these abstract basetypes (but
not
>     both at once, see [6]:-), that might ease reverse's task in some
> cases.

In the C code, the actual test is for PySequence_Check() which seems to
do a good job of finding non-mapping objects implementing __getitem__.


>     Why, such abstract basetypes might even make
operator.isMappingType
>     useful again -- right now, of course:
> >>> operator.isMappingType([])
> True
>     and therefore there isn't much point in that function:-).

In the meantime, I would like to remove that function from the operator
module.  It is broken.


> <donning suit="asbestos">
>    ...just a sec...
> </donning>
> 
> Ok, ready -- fire away!

<aiming weapon="flamethrower">
    So, 1.5.2 wasn't good enough for you.
    Perhaps *this* change will be to your liking.
</aiming>

<firing> 
    Fry type checking dog, fry! 
</firing>


Raymond


From arigo at tunes.org  Sun Nov  2 18:35:16 2003
From: arigo at tunes.org (Armin Rigo)
Date: Sun Nov  2 18:39:24 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
References: <3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20031102233516.GA22361@vicky.ecs.soton.ac.uk>

Hello Martin,

On Sun, Nov 02, 2003 at 08:05:55PM +0100, Martin v. L?wis wrote:
> > > More than that in the good cases.  Something I forgot was that you'd
> > > probably have to knock variable length types on the head.
> > 
> > Why?
> 
> Assuming "to knock on the head" means "to put an end to":
> 
> If you put all objects of the same type into a pool, you really want
> all objects to have the same side, inside a pool. With that
> assumption, garbage objects can be reallocated without causing
> fragmentation. If objects in a pool have different sizes, it is not
> possible to have an efficient reallocation strategy.

"Not easy" would have been more appropriate.  It is still basically what
malloc() does.

One way would be to use Python's current memory allocator, by adapting it to
sort objects into pools not only according to size but also according to type.  
What seems to me like a good solution would be to use one relatively large
"arena" per type and Python's memory allocator to subdivide each arena.  If
each arena starts at a pointer address which is properly aligned, then
*(p&MASK) gives you the type of any object, and possibly even without much
cache-miss overhead because there are not so many arenas in total (probably
only 1-2 per type in common cases, and arenas can be large).


A bientot,

Armin.


From martin at v.loewis.de  Sun Nov  2 19:18:53 2003
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun Nov  2 19:19:02 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <20031102233516.GA22361@vicky.ecs.soton.ac.uk>
References: <3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
Message-ID: <3FA59EED.1020900@v.loewis.de>

Armin Rigo wrote:
>>If you put all objects of the same type into a pool, you really want
>>all objects to have the same side, inside a pool. With that
>>assumption, garbage objects can be reallocated without causing
>>fragmentation. If objects in a pool have different sizes, it is not
>>possible to have an efficient reallocation strategy.
> 
> 
> "Not easy" would have been more appropriate.  It is still basically what
> malloc() does.

That's why I said "efficient". What malloc basically does is not 
efficient. It gets worse if, at reallocation time, you are not only 
bound by size, but also by type. E.g. if you have deallocated a tuple of
10 elements, and then reallocate a tuple of 6, the wasted space can only
hold a tuple of 1 element, nothing else.

> One way would be to use Python's current memory allocator, by adapting it to
> sort objects into pools not only according to size but also according to type.  
> What seems to me like a good solution would be to use one relatively large
> "arena" per type and Python's memory allocator to subdivide each arena.  If
> each arena starts at a pointer address which is properly aligned, then
> *(p&MASK) gives you the type of any object, and possibly even without much
> cache-miss overhead because there are not so many arenas in total (probably
> only 1-2 per type in common cases, and arenas can be large).

So where do you put strings with 100,000 elements (characters)? Or any 
other object that exceeds an arena in size?

Regards,
Martin


From bmr at austin.rr.com  Sun Nov  2 21:27:14 2003
From: bmr at austin.rr.com (Brian Rzycki)
Date: Sun Nov  2 21:27:18 2003
Subject: [Python-Dev] new language ideas
Message-ID: <3C7C23AC-0DA5-11D8-8CA1-00039376D608@austin.rr.com>

Hi all,

I've been tinkering with a bit of a pet project on and off for some 
time now.  I'm basically trying to adapt the python style/syntax for a 
language that is a bit better suited as a classical systems programming 
language.  To be able to program with Python at the high level and 
something very pythonesque at the lower level is very appealing. :)

Well, when thinking about this, I've come up with a few ideas I think 
might benefit Python as well.  Please forgive me if these are repeats, 
I've never seen anything related to this in the PEPs or on the list.  
I'm just tossing these out for Python's benefit...

/me dons asbestos long-johns...

Multiline comments
--------------------------
#BEGIN
...
#END

Everything in between is ignored.  It would be very useful when 
debugging decent sized blocks of code.  I know certain editors can 
auto-comment blocks, but it can be difficult to un-auto-comment said 
block.  The same smart editors could colorize the block accordingly, 
minimizing readiblity issues.

__doc__ variable
------------------------
docstrings are really a special case of programmer documentation.  It'd 
be a lot nicer if there were some way to isolate certain portions of 
information in the docstring.  Most contain a short description as well 
as the expect things (I won't say types on this list) ;).  docstrings 
could be aliased through the dictionary __doc__.  The exact symantics 
are a bit fuzzy right now, but I wanted to toss out the idea for public 
scrutiny.  Here's an example:

def f(x):
     "does nothing, really."
     return(x)

In this case, __doc__.desc would equal the docstring.  This would allow 
for backward compatibility and allow for extension.  Think author, 
webpage, and version at the global scope and pre/post conditions, 
dynamically created information about a function/class.

bit access of integers
----------------------------
Like strings, we can use [] to index into python integers.  It'd be a 
nice way to set/read individual bits of a given integer.  For example:

x = 5
x[0] = 0
print x
(prints 4)

The details of how to index (I was assuming big-endian in this example) 
are open to discussion.  This would make bit-banging in python be even 
easier than C (not to mention easier to read).  This assumes we want 
Python to be good at bit-banging. ;)


alternative base notation
---------------------------------
Python inherited C's notation for numbers of non-decimal bases.  I 
propose another with simpler syntax: number_base.  An example:

x = 24b_16
y = 1001_2
z = 96zz_36

The range for this notation would be 2 to 36 for the base.  This allows 
for the entire alphabet plus numbers to be used as numerical 
placeholders.  I'd be happy if _2, _8, _16 were the only ones 
implemented because those are the most commonly used.  It would be nice 
to treat it almost as if it were a call to a radix() function.  I think 
the notation has a nice look to it and I think makes it easy to read.

So that's it for now.  Let me know what you think.

-Brian Rzycki


From tim.one at comcast.net  Sun Nov  2 21:48:42 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sun Nov  2 21:48:47 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <3FA59EED.1020900@v.loewis.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEENGOAB.tim.one@comcast.net>

[Martin v. L?wis, on schemes to segregate object memory by type, with
 the type pointer shared at a calculated address]
> ...
> So where do you put strings with 100,000 elements (characters)? Or any
> other object that exceeds an arena in size?

You allocate enough extra memory so that there's room to stick a type
pointer at the calculated address; or, IOW, it becomes a one-object pool but
of unusually large size.  Somes bytes may be lost at the start of the
allocated region to allow planting a type pointer at a pool-aligned address;
but by assumption the object is "very large", so the wastage can be small in
percentage terms.

That said, the current pymalloc is relentlessy about speeding alloc/free of
exactly-the-same-size small blocks -- there's not much code that could be
reused in a type-segregated scheme (the debug pymalloc wrapper is a
different story -- it can wrap any malloc/free).


From DavidA at ActiveState.com  Sun Nov  2 22:36:46 2003
From: DavidA at ActiveState.com (David Ascher)
Date: Sun Nov  2 22:30:46 2003
Subject: [Python-Dev] OT: programming language creator or serial killer?
Message-ID: <3FA5CD4E.3020805@ActiveState.com>

A fun online quiz IMO (flash):

http://www.malevole.com/mv/misc/killerquiz/

I got 3/10 =)


From python at rcn.com  Sun Nov  2 22:50:58 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sun Nov  2 22:52:01 2003
Subject: [Python-Dev] OT: programming language creator or serial killer?
In-Reply-To: <3FA5CD4E.3020805@ActiveState.com>
Message-ID: <000301c3a1bd$b14e5ea0$e841fea9@oemcomputer>

[David Ascher]
> A fun online quiz IMO (flash):
> 
> http://www.malevole.com/mv/misc/killerquiz/
> 
> I got 3/10 =)

That was great link.
I got 8/10.


Raymond


From pje at telecommunity.com  Sun Nov  2 22:54:32 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Nov  2 22:53:33 2003
Subject: [Python-Dev] new language ideas
In-Reply-To: <3C7C23AC-0DA5-11D8-8CA1-00039376D608@austin.rr.com>
Message-ID: <5.1.0.14.0.20031102224600.021c2ec0@mail.telecommunity.com>

At 08:27 PM 11/2/03 -0600, Brian Rzycki wrote:
>Multiline comments
>--------------------------
>#BEGIN
>..
>#END
>
>Everything in between is ignored.  It would be very useful when debugging 
>decent sized blocks of code.  I know certain editors can auto-comment 
>blocks, but it can be difficult to un-auto-comment said block.  The same 
>smart editors could colorize the block accordingly, minimizing readiblity 
>issues.

Just triple quote.  I usually use """ for actual strings in my programs, 
and if I need to comment out a block I use '''.


>bit access of integers
>----------------------------
>Like strings, we can use [] to index into python integers.  It'd be a nice 
>way to set/read individual bits of a given integer.  For example:
>
>x = 5
>x[0] = 0
>print x
>(prints 4)
>
>The details of how to index (I was assuming big-endian in this example) 
>are open to discussion.  This would make bit-banging in python be even 
>easier than C (not to mention easier to read).  This assumes we want 
>Python to be good at bit-banging. ;)

Integers are immutable.  What you want is a bit array; you could write one 
of your own in Python easily enough, or C if you need higher 
performance.  Or maybe you could supply a patch for the Python 'array' 
module to support a bit type.


>alternative base notation
>---------------------------------
>Python inherited C's notation for numbers of non-decimal bases.  I propose 
>another with simpler syntax: number_base.  An example:
>
>x = 24b_16
>y = 1001_2
>z = 96zz_36
>
>The range for this notation would be 2 to 36 for the base.  This allows 
>for the entire alphabet plus numbers to be used as numerical 
>placeholders.  I'd be happy if _2, _8, _16 were the only ones implemented 
>because those are the most commonly used.

Python already implements 8 and 16, using 0 and 0x prefixes.  Presumably, 
you're therefore requesting an 0b or some such.  Note that you can already 
do this like so:

 >>> print int("100100",2)
36

However, if I were using bit strings a lot, I'd probably convert them to 
integers or longs in hex form, just to keep the program more compact.


From fincher.8 at osu.edu  Mon Nov  3 00:34:13 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Sun Nov  2 23:35:57 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <2mad7h72sr.fsf@starship.python.net>
References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
Message-ID: <200311030034.13193.fincher.8@osu.edu>

On Friday 31 October 2003 12:08 pm, Michael Hudson wrote:
> More than that in the good cases.  Something I forgot was that you'd
> probably have to knock variable length types on the head.

That's something I've always wondered about -- what exactly is a "variable 
length type" and why are they special?  From what I gather, they're types 
(long, str, and tuple are the main ones I know of) whose struct is actually 
of variable size -- rather than contain a pointer to a variable-size thing, 
they contain the variable-size thing themselves.

What do we gain from them?

(if there's some documentation I overlooked, feel free to point me to it.)

Thanks,
Jeremy

From martin at v.loewis.de  Sun Nov  2 23:56:27 2003
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun Nov  2 23:56:44 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <200311030034.13193.fincher.8@osu.edu>
References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>	<2mad7h72sr.fsf@starship.python.net>
	<200311030034.13193.fincher.8@osu.edu>
Message-ID: <3FA5DFFB.8030004@v.loewis.de>

Jeremy Fincher wrote:

 > That's something I've always wondered about -- what exactly is a
 > "variable length type" and why are they special?  From what I gather,
 > they're types (long, str, and tuple are the main ones I know of) whose
 > struct is actually of variable size -- rather than contain a pointer
 > to a variable-size thing, they contain the variable-size thing
 > themselves.

Correct. Examples include strings and tuples, but not lists and
dictionaries.

 > What do we gain from them?

Speed, by saving an extra allocation upon creation; also some speed
by saving an indirection upon access. It only works if the number of
items in the object is not going to change over the lifetime of the
object - in particular, for immutable objects. There is actually an
exception to this rule: If you own the only reference to the object,
you can afford to change its size (available for strings only).

Regards,
Martin


From aleaxit at yahoo.com  Mon Nov  3 02:54:52 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 02:54:59 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <002601c3a193$fcad8300$e841fea9@oemcomputer>
References: <002601c3a193$fcad8300$e841fea9@oemcomputer>
Message-ID: <200311030854.52823.aleaxit@yahoo.com>

On Sunday 02 November 2003 11:52 pm, Raymond Hettinger wrote:
> > 1. Shouldn't class UserString.UserString inherit from basestring?
>
> The functionality of UserString has been subsumed by inheriting from
> str.  So, its main purpose now is to keep old code working which means
> that it is probably not wise to suddenly convert it from a classic class
> to a new-style class.

OK, I guess.  The implementation doesn't offer all that much extra
convenience when compared to inheriting str, anyway -- no "factoring
out" a la DictMixin, for example.  Presumably there's little demand.

> At one time, I also requested an abstract numeric inheritance hierarchy
> with real=union(int,float,long) and numbers=union(real,complex).
> However, much time has passed and the need has never risen again.

I guess I just play too much with numbers...;-).


> > multiply inherit from basestring AND also from another builtin type
> which
> > does not in turn inherit from basestring.
>
> I would rather leave this open than introduce code to prevent it. My
> sense is that blocking it would introduce complexity in coding,
> documentation, understanding, and debugging while offering near zero
> payoff.

The payoff would be just in avoiding confusion.  I don't see what
complexity there could be in making each base* abstracttype
incompatible with the others -- guess I'm missing something...?


> In the C code, the actual test is for PySequence_Check() which seems to
> do a good job of finding non-mapping objects implementing __getitem__.

Unless I'm mistaken, that's exactly operator.isSequenceType(), and:

>>> import operator, UserDict
>>> operator.isSequenceType(UserDict.UserDict())
True

...wouldn't it be NICE to let the user help code needing to disambiguate
sequences from mappings by inheriting basesequence or basemapping...?

> operator.isMappingType
   ...
> In the meantime, I would like to remove that function from the operator
> module.  It is broken.

Yes, but isn't isSequenceType pretty iffy too...?


Alex


From python at rcn.com  Mon Nov  3 03:16:06 2003
From: python at rcn.com (Raymond Hettinger)
Date: Mon Nov  3 03:18:00 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <200311030854.52823.aleaxit@yahoo.com>
Message-ID: <000101c3a1e2$d9f811a0$e841fea9@oemcomputer>

[Alex]
> > > multiply inherit from basestring AND also from another builtin
type
> > which
> > > does not in turn inherit from basestring.

[Raymond]
> > I would rather leave this open than introduce code to prevent it. My
> > sense is that blocking it would introduce complexity in coding,
> > documentation, understanding, and debugging while offering near zero
> > payoff.

[Alex]
> The payoff would be just in avoiding confusion.  I don't see what
> complexity there could be in making each base* abstracttype
> incompatible with the others -- guess I'm missing something...?

More rules to remember:  Thing X doesn't work with thing Y but W which
is like X never got taken care of.

More docs to read and write:  You would document that the combination is
illegal and explain why, right?

More code to implement the check for prohibited combinations.

Payoff: only when someone multiply inherits from an abstract builtin
type and another builtin type.  Does anyone other than you, me, Armin,
and Tim even use multiple inheritance?  This basically never comes up
unless we're spending an evening seeing how creating toy problems just
to push the features to the limits.

Put another way:  Is this a real world problem for anyone outside python
blackbelts who already know better?  Answer:  Probably not.


[Raymond]
> > operator.isMappingType
>    ...
> > In the meantime, I would like to remove that function from the
operator
> > module.  It is broken.

[Alex]
> Yes, but isn't isSequenceType pretty iffy too...?

Nope.

>>> import operator
>>> map(operator.isSequenceType, [(), [], 'ab', u'ab', {}, 1])
[True, True, True, True, False, False]
>>> map(operator.isMappingType, [(), [], 'ab', u'ab', {}, 1])
[True, True, True, True, True, False]


The first is 100% correct.
The second has four false positives.

For user defined classes implementing __getitem__, neither function can
distinguish between a mapping or a sequence.   This is the best they can
do.


Raymond Hettinger


From aleaxit at yahoo.com  Mon Nov  3 03:40:02 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 03:40:12 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <000101c3a1e2$d9f811a0$e841fea9@oemcomputer>
References: <000101c3a1e2$d9f811a0$e841fea9@oemcomputer>
Message-ID: <200311030940.02592.aleaxit@yahoo.com>

On Monday 03 November 2003 09:16 am, Raymond Hettinger wrote:
   ...
> type and another builtin type.  Does anyone other than you, me, Armin,
> and Tim even use multiple inheritance?  This basically never comes up

I think you're not very acquainted with people coming from C++ or Eiffel...

> > Yes, but isn't isSequenceType pretty iffy too...?
>
> Nope.
>
> >>> import operator
> >>> map(operator.isSequenceType, [(), [], 'ab', u'ab', {}, 1])
>
> [True, True, True, True, False, False]
>
> >>> map(operator.isMappingType, [(), [], 'ab', u'ab', {}, 1])
>
> [True, True, True, True, True, False]
>
> The first is 100% correct.
> The second has four false positives.

Right: isSequenceType works on built-ins, isMappingType doesn't.


> For user defined classes implementing __getitem__, neither function can
> distinguish between a mapping or a sequence.   This is the best they can
> do.

OK -- so, if we had basesequence and basemapping, the user COULD help
make the distinction totally reliable (if multiply inheriting from both was
allowed, the user could also make a total unusable muddle of course:-).


Alex


From anthony at interlink.com.au  Mon Nov  3 03:48:18 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Nov  3 03:52:06 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed 
In-Reply-To: <20031102092517.GB5805@zot.electricrain.com> 
Message-ID: <200311030848.hA38mItM008890@localhost.localdomain>


>From what I understand, these fixes aren't just fixes to the test suite,
but also to fix real problems with the bsddb code itself. In that case, 
should it be added to the 23 branch? I'd be a solid +1 on this for 2.3.3.

Anyone else?

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From aleaxit at yahoo.com  Mon Nov  3 03:54:24 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 03:54:30 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <200311030848.hA38mItM008890@localhost.localdomain>
References: <200311030848.hA38mItM008890@localhost.localdomain>
Message-ID: <200311030954.24191.aleaxit@yahoo.com>

On Monday 03 November 2003 09:48 am, Anthony Baxter wrote:
> From what I understand, these fixes aren't just fixes to the test suite,
> but also to fix real problems with the bsddb code itself. In that case,
> should it be added to the 23 branch? I'd be a solid +1 on this for 2.3.3.
>
> Anyone else?

Anything that makes bsddb less flaky on 2.3.* gets a big hearty enthusiastic 
+1 from me too.


Alex


From mwh at python.net  Mon Nov  3 06:35:05 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov  3 06:35:08 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <20031102233516.GA22361@vicky.ecs.soton.ac.uk> (Armin Rigo's
	message of "Sun, 2 Nov 2003 23:35:16 +0000")
References: <3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
Message-ID: <2mu15l65xy.fsf@starship.python.net>

Armin Rigo <arigo@tunes.org> writes:

> Hello Martin,
>
> On Sun, Nov 02, 2003 at 08:05:55PM +0100, Martin v. L?wis wrote:
>> > > More than that in the good cases.  Something I forgot was that you'd
>> > > probably have to knock variable length types on the head.
>> > 
>> > Why?
>> 
>> Assuming "to knock on the head" means "to put an end to":
>> 
>> If you put all objects of the same type into a pool, you really want
>> all objects to have the same side, inside a pool. With that
>> assumption, garbage objects can be reallocated without causing
>> fragmentation. If objects in a pool have different sizes, it is not
>> possible to have an efficient reallocation strategy.
>
> "Not easy" would have been more appropriate.  It is still basically what
> malloc() does.

Well, yeah, but as Tim said pymalloc gets its wins from assuming that
each allocation is the same size.  You could combine my idea with some
other allocation scheme, certainly, but given the relative paucity of
variable length types and the reduction in allocator overhead using
something like pymalloc gives us, I think it might just be easier to
not do them any more.  Of course, I don't see myself having any time
to play with this idea any time soon, and it's probably not really
beefy enough to get a masters thesis from, so maybe we'll never know.

> One way would be to use Python's current memory allocator, by
> adapting it to sort objects into pools not only according to size
> but also according to type.

That's pretty much what I was suggesting.

> What seems to me like a good solution would be to use one relatively
> large "arena" per type and Python's memory allocator to subdivide
> each arena.  If each arena starts at a pointer address which is
> properly aligned, then *(p&MASK) gives you the type of any object,
> and possibly even without much cache-miss overhead because there are
> not so many arenas in total (probably only 1-2 per type in common
> cases, and arenas can be large).

Hmm, maybe.  I'm not going to make guesses about that one :-)

Cheers,
mwh

-- 
  ... Windows proponents tell you that it will solve things that
  your Unix system people keep telling you are hard.  The Unix 
  people are right: they are hard, and Windows does not solve 
  them, ...                            -- Tim Bradshaw, comp.lang.lisp

From mwh at python.net  Mon Nov  3 07:14:31 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov  3 07:14:35 2003
Subject: [Python-Dev] reflections on basestring -- and other abstract
	basetypes
In-Reply-To: <200311022319.42725.aleaxit@yahoo.com> (Alex Martelli's message
	of "Sun, 2 Nov 2003 23:19:42 +0100")
References: <200311022319.42725.aleaxit@yahoo.com>
Message-ID: <2m8ymx6448.fsf@starship.python.net>

Alex Martelli <aleaxit@yahoo.com> writes:

> 1. Shouldn't class UserString.UserString inherit from basestring?  After all,
>     basestring exists specifically in order to encourage typetests of the form
>     isinstance(x, basestring) -- wouldn't it be better if such tests could
>     also catch "user-tweaked strings" derived from UserString ... ?
>
> 2. If we do want to encourage such typetest idioms, it might be a good idea
>    to provide some other such abstract basetypes for the purpose.

I'd really rather not.  I think this is a slippery slope I want to
stay right at the top of.  Doing different things depending on which
protocol a function argument happens to implement is icky, even if
it's sometimes extremely convenient.  I don't think we should make it
easier.

> 4. Furthermore, providing "basenumber" would let user-coded classes "flag"
>     in a simple and direct way "I'm emulating numbers".  This might well be
>     useful _to Python itself_...
>     Right now, I'm stuck for an answer to the bug that a user-coded class
>     which exposes __mul__ but not __rmul__ happens to support its instances
>     being multiplied by an integer on the right -- quite surprising to users!
>     The problem is that this behavior is apparently expected, though not
>     documented, when the user-coded class is trying to simulate a _sequence_
>     rather than a number.  So, I can't just take the peculiar "accidental
>     commutativity with integers only" away.
>     IF a user class could flag itself as "numeroid" by inheriting basenumber,
>     THEN the "accidental commutativity" COULD be easily removed at least
>     for such classes.

This is just a bug, albeit a subtle and hard to fix one.

And, as a paid up member of the anti-operator-overloading-bigot camp,
I'll just say:

a) if your user coded class is so unlike a number as to not be
   multipliable by an int, why are you overloading '*'?

and

b) if Python had different operators for sequence repition and
   multiplying numbers, the relavent bug would be much easier to
   fix...

> 7. one might of course think of other perhaps-useful abstract basetypes,
>     such as e.g. basesequence or basemapping -- right now the new
>     forthcoming built-in 'reverse' is trying to avoid "accidentally working"
>     on mappings by featuretesting for (e.g.) has_key, but if the user
>     could optionally subclass either of these abstract basetypes (but not
>     both at once, see [6]:-), that might ease reverse's task in some cases.

Well, I (unsurprisingly, given the above) think this problem again
comes from using the same notation for two different things (mappings
and sequences).

Or looking at it another way, it comes from ancient misdesigns in the
C API that it's now essential impossible to fix (that sq_item is an
intargfunc, roughly).  I don't think we should try to cover up these
misfeatures with another.

Cheers,
mwh

-- 
  The meaning of "brunch" is as yet undefined.
                                             -- Simon Booth, ucam.chat

From aleaxit at yahoo.com  Mon Nov  3 07:47:10 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 07:47:22 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
Message-ID: <200311031347.10995.aleaxit@yahoo.com>

I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and 
Michael Hudson commented that he thinks that so doing is a bad idea, that bug
fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way 
around.  Is this indeed the policy (have I missed some guidelines about it)? 

I guess for this round of fixes I will find the time to forward-port them to 
the 2.4 trunk (in AMPLE time for a 2.4 release -- as 2.3.3 is going to come 
well before 2.4 releases, the other way 'round wouldn't be quite so sure:-),
but what about the future?  Should fixes applicable to both 2.3.* and 2.4
be made [a] always to both trunk and branch, [b] always to the trunk but
to the branch only once one comes around to that, [c] always to the branch but
to the trunk only once one comes around to that, ...?

Oh, incidentally, if it matters -- most were docs issues, including as "docs" 
also some changes to comments that previously were misleading or ambiguous.

I guess that my problem is that I think of 2.3.* fixes as things that will be 
useful to "the general Python-using public" pretty soon, with 2.4 far off in 
the future, so that it appears to me that trying to make 2.3.* as well fixed
as possible has higher priority.  But if that conflicts with policy, I will of 
course change anyway.


Thanks,

Alex


From mcherm at mcherm.com  Mon Nov  3 08:35:57 2003
From: mcherm at mcherm.com (Michael Chermside)
Date: Mon Nov  3 08:36:07 2003
Subject: [Python-Dev] new language ideas
Message-ID: <1067866557.3fa659bd77a4d@mcherm.com>

Brian Rzycki writes:

> Multiline comments

Already got it. Triple quoting.

> __doc__ variable

Just making __doc__ a dictionary instead of a string doesn't achieve
anything *unless* there is a fairly standard set of expected keys
in this dictionary. (This is documentation, so the list of standard
keys doesn't have to be universal but without a common set of keys
you can expect to encounter, the only thing you can really do is to 
print out the entire contents of the dictionary, and if all you can
do is print it you might as well just be using a string.)

You write:
> Think author, 
> webpage, and version at the global scope and pre/post conditions, 
> dynamically created information about a function/class.

which is an interesting-sounding list, but if I saw a PEP which 
proposed making __doc__ a dictionary which *didn't* specify just what
the "common" key would be and what they would contain, then I'd be
-1 on it. And if it *did* specify, I imagine there would be far more
controversy than you expect.

> bit access of integers
> ----------------------------
> Like strings, we can use [] to index into python integers.

Hmm... very interesting, actually. But on reflection, I think we're
better off leaving integers as *numbers* and having a *separate* type
for bitmasks. This separate type could even be written in Python
(I doubt the speed of a C implementation would be worthwhile... the
real advantage of the type would be ease of use, not performance).
Clearly it would have a convert-to-integer feature (perhaps one which
would let you specify whether you wanted signed or unsigned, and what
width, and what endian-ness, etc.).

> alternative base notation
> ---------------------------------
> Python inherited C's notation for numbers of non-decimal bases.  I 
> propose another with simpler syntax: number_base.

Definite -1 from me. Several reasons. Here's a number in hex:
     b4a0_16
Oh wait... sorry, that's not a number, that's an identifier.

Another reason is that it's just not something that is done all
that frequently. Another reason is that we already have TWO syntaxes
for doing numbers in different bases: There's the 0x prefix for
hex and the 0 prefix for octal (but if I had my way we'd dump that...
who uses octal?). And there's the "int('<number>', <base>)" syntax
which has just a few more characters than your solution and is IMHO
more readable.

Even if I'm shooting most of these down, don't give up... you're
certainly injecting a little creative thought into the process.
Sometimes that stirs up really exciting ideas.

-- Michael Chermside


From mcherm at mcherm.com  Mon Nov  3 08:55:12 2003
From: mcherm at mcherm.com (Michael Chermside)
Date: Mon Nov  3 08:55:25 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
Message-ID: <1067867712.3fa65e4084e79@mcherm.com>

Alex muses on basestring:
> 2. If we do want to encourage such typetest idioms, it might be a good idea
>    to provide some other such abstract basetypes for the purpose.
       [...]
>   If there was an abstract basetype, say "baseinteger", from which int and
>    long derived,

Great idea... I think there should be single type from which all built-in
integer-like types inherit, and which user-designed types can inherit
if they want to behave like integers. I think that type should be called
"int". Once the int/long distinction is completely gone, this will be
quite clean, the only confusion now is that the int/long distinction isn't
yet completely hidden.

> 4. Furthermore, providing "basenumber" would let user-coded classes "flag"
>    in a simple and direct way "I'm emulating numbers".

Okay, that sounds like it might be useful, at least to those people who
work with wierd varieties of numbers. But I can't think how. Normally,
I figure that if you overload addition, multiplication, subtraction, and
perhaps a few other such operators, then you're trying to emulate numbers
(that or you're abusing operator overloading, and I have no real sympathy
for you). What use cases do you have for "basenumber" (I don't mean 
examples of classes that would inherit from basenumber, I mean examples
where that inheritance would make a difference)?

>  IF a user class could flag itself as "numeroid" by inheriting basenumber,
>  THEN the "accidental commutativity" COULD be easily removed at least
>  for such classes.

Okay, that's one use case. Any others? 'cause I'm coming up blank.

> ...does anybody see any problem if, in 2.4, we take away the ability to
> multiply inherit from basestring AND also from another builtin type which 
> does not in turn inherit from basestring...?

I do! I personally wouldn't try to create the class "perlnum" which
inherits from basestring and also basenumber and which tries to magicaly
know which is desired and convert back and forth on demand. But I'm
sure *someone* out there is just dying to write such a class. Why
prevent them? Not that I'd every USE such a monstrocity, but just don't
see the ADVANTAGE in providing the programmer with a straightjacket by
typechecking them (at the language level) to prevent uses outside of
those envisioned by the language implementers. It sounds decidedly
non-pythonic to me.

-- Michael Chermside


From mwh at python.net  Mon Nov  3 08:58:05 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov  3 08:58:10 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031347.10995.aleaxit@yahoo.com> (Alex Martelli's message
	of "Mon, 3 Nov 2003 13:47:10 +0100")
References: <200311031347.10995.aleaxit@yahoo.com>
Message-ID: <2mznfd4kr6.fsf@starship.python.net>

Alex Martelli <aleaxit@yahoo.com> writes:

> I made a few bugfix check-ins to the 2.3 maintenance branch this
> weekend and Michael Hudson commented that he thinks that so doing is
> a bad idea, that bug fixes should filter from the 2.4 trunk to the
> 2.3 branch and not the other way around.  Is this indeed the policy
> (have I missed some guidelines about it)?

Well, it's more practice than policy.  I guess the (my...) thinking
was that the trunk gets more testing, so it's a proving ground for
fixes.

It also depends on who's going to be release monkey for the next point
release.  The branch is to a certain extent "theirs" and they should
get to decide how things work.  I'm not sure who's got the hat at the
moment (Anthony?).

> I guess for this round of fixes I will find the time to forward-port
> them to the 2.4 trunk (in AMPLE time for a 2.4 release -- as 2.3.3
> is going to come well before 2.4 releases, the other way 'round
> wouldn't be quite so sure:-), but what about the future?  Should
> fixes applicable to both 2.3.* and 2.4 be made [a] always to both
> trunk and branch, [b] always to the trunk but to the branch only
> once one comes around to that, [c] always to the branch but to the
> trunk only once one comes around to that, ...?

My order of preference were I to be 2.3.3 monkey would be [a], then
[b].

> Oh, incidentally, if it matters -- most were docs issues, including
> as "docs" also some changes to comments that previously were
> misleading or ambiguous.
>
> I guess that my problem is that I think of 2.3.* fixes as things
> that will be useful to "the general Python-using public" pretty
> soon, with 2.4 far off in the future, so that it appears to me that
> trying to make 2.3.* as well fixed as possible has higher priority.
> But if that conflicts with policy, I will of course change anyway.

Maybe a decision could be made now and the conclusions written down
somewhere?  My habits are to do all work in the trunk checkout and
then backport, but I could adapt if the decision went the other way.

Sometimes it's not clear whether a fix is applicable to the branch,
for one thing.

Cheers,
mwh

-- 
  Well, yes.  I don't think I'd put something like "penchant for anal
  play" and "able to wield a buttplug" in a CV unless it was relevant
  to the gig being applied for...
                                 -- Matt McLeod, alt.sysadmin.recovery

From aahz at pythoncraft.com  Mon Nov  3 09:01:23 2003
From: aahz at pythoncraft.com (Aahz)
Date: Mon Nov  3 09:01:26 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031347.10995.aleaxit@yahoo.com>
References: <200311031347.10995.aleaxit@yahoo.com>
Message-ID: <20031103140123.GA14146@panix.com>

On Mon, Nov 03, 2003, Alex Martelli wrote:
>
> I made a few bugfix check-ins to the 2.3 maintenance branch this
> weekend and Michael Hudson commented that he thinks that so doing is a
> bad idea, that bug fixes should filter from the 2.4 trunk to the 2.3
> branch and not the other way around.  Is this indeed the policy (have
> I missed some guidelines about it)?

PEP 6:

    As individual patches get contributed to the feature release fork,
    each patch contributor is requested to consider whether the patch is
    a bug fix suitable for inclusion in a patch release.  If the patch is
    considered suitable, the patch contributor will mail the SourceForge
    patch (bug fix?) number to the maintainers' mailing list.

That seems clear enough to me, though it could probably stand some
updating for using appropriate vocabulary and matching current practice.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From cpr at emsoftware.com  Mon Nov  3 09:44:27 2003
From: cpr at emsoftware.com (Chris Ryland)
Date: Mon Nov  3 09:45:05 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
Message-ID: <397F8C9A-0E0C-11D8-8358-000393DC534A@emsoftware.com>

Michael Hudson wrote:
> Remove the ob_type field from all PyObjects.  Make pymalloc mandatory,
> make it use type specific pools and store a pointer to the type object
> at the start of each pool.
>
> So instead of
>   p->ob_type
> it's
>   *(p&MASK)
>
> I think having each type in its own pools would also let you lose the
> gc_next & gc_prev fields.
>
> Combined with a non-refcount GC, you could hammer sizeof(PyIntObject)
> down to sizeof(long)!

Yes, this is a variant of an implementation technique used in early 
Lisp and Lisp-like language systems with types (e.g., Harvard's EL-1) 
back in the early 70's (at least--that's when I first encountered it). 
In those systems, you'd use the "page #" (higher-order bits) of a 
pointer to reference a type table.

Good idea, but perhaps less effective these days where memory isn't 
quite so dear. (Back then, a large system was a PDP-10 with 256K 36-bit 
words, or around 1MB.)

Cheers!
--Chris Ryland / Em Software, Inc. / www.emsoftware.com


From mwh at python.net  Mon Nov  3 09:52:28 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov  3 09:52:33 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <397F8C9A-0E0C-11D8-8358-000393DC534A@emsoftware.com> (Chris
	Ryland's message of "Mon, 3 Nov 2003 09:44:27 -0500")
References: <397F8C9A-0E0C-11D8-8358-000393DC534A@emsoftware.com>
Message-ID: <2mvfq14i8j.fsf@starship.python.net>

Chris Ryland <cpr@emsoftware.com> writes:

> Michael Hudson wrote:
>> Remove the ob_type field from all PyObjects.  Make pymalloc mandatory,
>> make it use type specific pools and store a pointer to the type object
>> at the start of each pool.
>>
>> So instead of
>>   p->ob_type
>> it's
>>   *(p&MASK)
>>
>> I think having each type in its own pools would also let you lose the
>> gc_next & gc_prev fields.
>>
>> Combined with a non-refcount GC, you could hammer sizeof(PyIntObject)
>> down to sizeof(long)!
>
> Yes, this is a variant of an implementation technique used in early
> Lisp and Lisp-like language systems with types (e.g., Harvard's EL-1)
> back in the early 70's (at least--that's when I first encountered
> it). In those systems, you'd use the "page #" (higher-order bits) of a
> pointer to reference a type table.

Heh, that's interesting to know.  Nothing new under the sun & all
that.

> Good idea, but perhaps less effective these days where memory isn't
> quite so dear. (Back then, a large system was a PDP-10 with 256K
> 36-bit words, or around 1MB.)

Cache memory is still expensive: if we can get more PyObjects into
each cache line, we still win (at least, that's what I was thinking).

Also, for say small tuples, the overhead of gc fields, refcount and
type pointer is really frightening.  Yes, memory is cheap, but using 3
or so times as much as we need to is still excessive.

Cheers,
mwh

-- 
  If trees could scream, would we be so cavalier about cutting them
  down? We might, if they screamed all the time, for no good reason.
                                                        -- Jack Handey

From arigo at tunes.org  Mon Nov  3 09:58:34 2003
From: arigo at tunes.org (Armin Rigo)
Date: Mon Nov  3 10:02:29 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <3FA59EED.1020900@v.loewis.de>
References: <3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
	<3FA59EED.1020900@v.loewis.de>
Message-ID: <20031103145834.GA22719@vicky.ecs.soton.ac.uk>

Hello Martin,

On Mon, Nov 03, 2003 at 01:18:53AM +0100, "Martin v. L?wis" wrote:
> >"Not easy" would have been more appropriate.  It is still basically what
> >malloc() does.
> 
> That's why I said "efficient". What malloc basically does is not 
> efficient. It gets worse if, at reallocation time, you are not only 
> bound by size, but also by type. E.g. if you have deallocated a tuple of
> 10 elements, and then reallocate a tuple of 6, the wasted space can only
> hold a tuple of 1 element, nothing else.

That's why we have a custom allocator in Python, to minimize this kind of
impact by subdividing arenas into pools of objects grouped by size.  I admit
that adding the type constrain adds burden to the allocator, though.

> So where do you put strings with 100,000 elements (characters)? Or any 
> other object that exceeds an arena in size?

These ones are not a problem, because objects and arena can be larger than the
MASK.  You get to the start of the arena by masking bits away from the address
of the *beginning* of the object.  An arena can be of any size as long as all
the objects it contains starts in the first MASK bytes.  For a very large
object, the arena would contain only this object, which would then start at
the beginning of the arena.

I'm more concerned about medium-sized objects, e.g. the ones whose size is 51%
of MASK.  At the moment I don't see a good solution for these.


A bientot,

Armin.


From anthony at interlink.com.au  Mon Nov  3 10:01:56 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Nov  3 10:05:40 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch 
In-Reply-To: <2mznfd4kr6.fsf@starship.python.net> 
Message-ID: <200311031501.hA3F1uH0016389@localhost.localdomain>


>>> Michael Hudson wrote
> Well, it's more practice than policy.  I guess the (my...) thinking
> was that the trunk gets more testing, so it's a proving ground for
> fixes.
> 
> It also depends on who's going to be release monkey for the next point
> release.  The branch is to a certain extent "theirs" and they should
> get to decide how things work.  I'm not sure who's got the hat at the
> moment (Anthony?).

Unless someone desperately wants it, I'm happy to keep on doing it. What
I'd prefer:

  - Apply to trunk first (assuming, of course, that the patch isn't something
    that's only needed on the branch - at this point in time, I can't see 
    that happening, as release23-maint and the trunk haven't diverged far
    enough yet)
  - Mark (in checkin message) if the patch is a bugfix candidate
  - If you're comfortable that the patch is a non-controversial bugfix, then
    commit it to the branch as well, AFTER you have run the unittests on
    the branch to make sure it still works)

What makes for a controversial vs non-controversial patch? There's a couple
of things I think are important to bear in mind:

  - Functionality changes are controversial. Unless there's been a discussion
    and agreement (or BDFL fiat <wink>) on python-dev, it shouldn't go in.
  - Major changes just near a release are going to be controversial, as it
    makes the life of the release-monkey-of-the-moment more painful. 

At the end of the day, if you're not sure your patch should go to the branch,
then mark it so in the checkin message, and someone (me, mwh, someone else 
willing to look into it) can make a judgment call.

On the other hand, no-one's going to jump up and down screaming if you do
check something in that probably shouldn't have gone in - we can always 
just revert it if necessary. I reserve the right to jump up and down if
someone checks something in when I'm in the middle of a release and the
branch is frozen, though <wink>.

Also, if you're checking something into the branch, please try and make it
obvious that the change is a backport or whatever. Something like 
Backport of <trunk checkin message> 
is good.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From jeremy at alum.mit.edu  Mon Nov  3 10:36:33 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Mon Nov  3 10:39:32 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031347.10995.aleaxit@yahoo.com>
References: <200311031347.10995.aleaxit@yahoo.com>
Message-ID: <1067873793.19568.27.camel@localhost.localdomain>

On Mon, 2003-11-03 at 07:47, Alex Martelli wrote:
> I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and 
> Michael Hudson commented that he thinks that so doing is a bad idea, that bug
> fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way 
> around.  Is this indeed the policy (have I missed some guidelines about it)? 

It is customary to fix things on the trunk first, then backport to
branches where it is needed.  People who maintain branches often watch
the trunk to look for things that need to be backported.  As far as I
know, no one watches the branches to look for things to port to the
trunk.  It may get lost if it's only on a branch.

The best thing to do is your option [a]: Fix it in both places at once. 
Then there's nothing to be forgotten when time for a release rolls
around.

Jeremy


From skip at pobox.com  Mon Nov  3 10:42:31 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Nov  3 10:42:38 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/bsddb
	__init__.py, 1.11, 1.12
In-Reply-To: <E1AGT99-00037N-00@sc8-pr-cvs1.sourceforge.net>
References: <E1AGT99-00037N-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <16294.30567.537151.106168@montanaro.dyndns.org>


    greg>   import UserDict
    greg>   class _iter_mixin(UserDict.DictMixin):
    greg>       def __iter__(self):
    greg>           try:
    ...

Should _iter_mixin inherit from dict, or is there a backward compatibility
issue?

Skip

From arigo at tunes.org  Mon Nov  3 10:55:36 2003
From: arigo at tunes.org (Armin Rigo)
Date: Mon Nov  3 10:59:29 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <2mu15l65xy.fsf@starship.python.net>
References: <3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
	<2mu15l65xy.fsf@starship.python.net>
Message-ID: <20031103155536.GA29074@vicky.ecs.soton.ac.uk>

Hello Michael,

On Mon, Nov 03, 2003 at 11:35:05AM +0000, Michael Hudson wrote:
> > "Not easy" would have been more appropriate.  It is still basically what
> > malloc() does.
> 
> Well, yeah, but as Tim said pymalloc gets its wins from assuming that
> each allocation is the same size.  You could combine my idea with some
> other allocation scheme, certainly, but given the relative paucity of
> variable length types and the reduction in allocator overhead using
> something like pymalloc gives us, I think it might just be easier to
> not do them any more.  Of course, I don't see myself having any time
> to play with this idea any time soon, and it's probably not really
> beefy enough to get a masters thesis from, so maybe we'll never know.

Ok.  I expect it to be much easier to experiment with with PyPy anyway.


Armin


From mwh at python.net  Mon Nov  3 11:00:58 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov  3 11:01:02 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <20031103155536.GA29074@vicky.ecs.soton.ac.uk> (Armin Rigo's
	message of "Mon, 3 Nov 2003 15:55:36 +0000")
References: <3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
	<2mu15l65xy.fsf@starship.python.net>
	<20031103155536.GA29074@vicky.ecs.soton.ac.uk>
Message-ID: <2mr80p4f2d.fsf@starship.python.net>

Armin Rigo <arigo@tunes.org> writes:

> Hello Michael,
>
> On Mon, Nov 03, 2003 at 11:35:05AM +0000, Michael Hudson wrote:
>> > "Not easy" would have been more appropriate.  It is still basically what
>> > malloc() does.
>> 
>> Well, yeah, but as Tim said pymalloc gets its wins from assuming that
>> each allocation is the same size.  You could combine my idea with some
>> other allocation scheme, certainly, but given the relative paucity of
>> variable length types and the reduction in allocator overhead using
>> something like pymalloc gives us, I think it might just be easier to
>> not do them any more.  Of course, I don't see myself having any time
>> to play with this idea any time soon, and it's probably not really
>> beefy enough to get a masters thesis from, so maybe we'll never know.
>
> Ok.  I expect it to be much easier to experiment with with PyPy anyway.

This had occured to me too :-)

Cheers,
mwh

-- 
  Never meddle in the affairs of NT. It is slow to boot and quick to
  crash.                                             -- Stephen Harris
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

From aleaxit at yahoo.com  Mon Nov  3 11:02:54 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 11:03:04 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <1067867712.3fa65e4084e79@mcherm.com>
References: <1067867712.3fa65e4084e79@mcherm.com>
Message-ID: <200311031702.54774.aleaxit@yahoo.com>

On Monday 03 November 2003 02:55 pm, Michael Chermside wrote:
> Alex muses on basestring:
> > 2. If we do want to encourage such typetest idioms, it might be a good
> > idea to provide some other such abstract basetypes for the purpose.
>
>        [...]
>
> >   If there was an abstract basetype, say "baseinteger", from which int
> > and long derived,
>
> Great idea... I think there should be single type from which all built-in
> integer-like types inherit, and which user-designed types can inherit
> if they want to behave like integers. I think that type should be called
> "int". Once the int/long distinction is completely gone, this will be

Unfortunately, unless int is made an abstract type, that doesn't help at
all to "type-flag" user-coded types (be they C-coded or Python-coded):
they want to tell "whoever it may concern" that they're intended to be
usable as integers, but not uselessly carry around an instance of int for
the purpose (and need to contort their own layout, if C-coded, for that).

Abstract basetypes such as basestring are useful only to "flag" types as
(intending to conform to) some concept: they don't carry implementation.
Specifically, basestring has no other use except supporting isinstance
(and, I guess, issubclass in some cases:-).  Concrete types such as int
carry more baggage (and provide more uses).

I'm not sure whether it makes sense to have basestring in Python, but
I assume it must -- it's a recent addition, not "legacy", so why would it
have been accepted if it made no sense?  So, a user-coded type can
flag itself as intending to be stringlike, if it wishes, without carrying any
baggage due to that.  Why is intlike so drastically different?

> quite clean, the only confusion now is that the int/long distinction isn't
> yet completely hidden.
>
> > 4. Furthermore, providing "basenumber" would let user-coded classes
> > "flag" in a simple and direct way "I'm emulating numbers".
>
> Okay, that sounds like it might be useful, at least to those people who
> work with wierd varieties of numbers. But I can't think how. Normally,

By allowing a simple test for "is X supposed to be a number", just like
isinstance(X, basestring) allows an equally simple test for "is X supposed
to be a string".  For example, such tests as imaplib.py's
    isinstance(date_time, (int, float))
(I'm not sure why long is omitted here) would simplify to
    isinstance(date_time, basenumber)

There aren't many such checks in the standard library, because overall
it doesn't do much with numbers (while it does work a lot with strings).

But, the categories of use cases aren't very different: either one is
asserting that X is-a [something], a la "assert isinstance(X,...", or one
is checking whether X is-a [something] (i.e. X is allowed to be either
a "something", or not, and there is different behavior in either case).

> I figure that if you overload addition, multiplication, subtraction, and
> perhaps a few other such operators, then you're trying to emulate numbers
> (that or you're abusing operator overloading, and I have no real sympathy

All these operators are defined, in various branch of maths, for things
that are very different from "a number".  Surely you're not claiming that
Numeric is "abusing operator overloading" by allowing users to code
a+b, a*b, a-b etc where a and b are multi-dimensional arrays?  The
ability to use such notation, which is fully natural in the application areas
those users come from, is important to many users.

> for you). What use cases do you have for "basenumber" (I don't mean
> examples of classes that would inherit from basenumber, I mean examples
> where that inheritance would make a difference)?

Let me offer just a couple of use cases, one per kind.  For example,

def __mul__(self, other):
    if isinstance(other, self.KnownNumberTypes):
        return self.__class__([ x*other for x in self.items ])
    else:
        # etc etc, various other multiplication cases

right now, that (class, actually) attribute KnownNumberTypes starts out
"knowing" about int, long, float, gmpy.mpz, etc, and may require user
customization (e.g by subclassing) if any other "kind of (scalar) number"
needs to be supported; besides, the isinstance check must walk linearly
down the tuple of known number types each time.  (I originally had
quite a different test structure:
    try: other + 0
    except TypeError:  # other is not a number
        # various other multiplication cases
    else:
        # other is a number, so...
        return self.__class__([ x*other for x in self.items ])
but the performance for typical benchmarks improved with the isinstance
test, so, reluctantly, that's what I changed to).  If an abstract basetype
'basenumber' caught many useful cases, I'd put it right at the start of
the KnownNumberTypes tuple, omit all subclasses thereof from it, get
better performance, AND be able to document very simply what the user
must do to ensure his own custom type is known to me as "a number".

That's a case where I need to accept both numbers and non-numbers
and do different things.  As for "checking it's a number" I find it quite
OK to do it by trying X+0 and letting the exception, if any, propagate --
just as "checking if it's a string" could proceed by doing X+''.  But maybe
I'm just old-fashioned in this acceptance -- particularly if one thinks of
C-coded extensions, checking for a basetype might be far handier.  E.g.,
in  Python/bltinmodule.c , function builtin_sum uses C-coded typechecking
to single out strings as an error case:

		/* reject string values for 'start' parameter */
		if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
			PyErr_SetString(PyExc_TypeError,
				"sum() can't sum strings [use ''.join(seq) instea

[etc].  Now, what builtin_sum really "wants" to do is to accept numbers,
only -- it's _documented_ as being meant for "numbers": it uses +, NOT
+=, so its performance on sequences, matrix and array-ish things, etc, 
is not going to be good.  But -- it can't easily _test_ whether something 
"is a number".  If we had a PyBaseNumber_Type to use here, it would
be smooth, easy, and fast to check for it.


> >  IF a user class could flag itself as "numeroid" by inheriting
> > basenumber, THEN the "accidental commutativity" COULD be easily removed
> > at least for such classes.
>
> Okay, that's one use case. Any others? 'cause I'm coming up blank.

I see a few other cases in the standard library which want to treat "numbers"
in some specific way different from other types (often forgetting longs:-), 
e.g. Lib/plat-mac/plistlib.py has one.  In gmpy, I would often like some 
operations to be able to accept "a number", perhaps by letting it try to 
transform itself into a float as a worst case (so complex numbers would fail 
there), but I definitely do NOT want to accept non-number objects which 
"happen to be able to return a value from float(x)", such as strings.  In all
such cases of wanting to check if something "is a number", an abstract
basetype might be handy, smooth, fast.


> > ...does anybody see any problem if, in 2.4, we take away the ability to
> > multiply inherit from basestring AND also from another builtin type which
> > does not in turn inherit from basestring...?
>
> I do! I personally wouldn't try to create the class "perlnum" which
> inherits from basestring and also basenumber and which tries to magicaly
> know which is desired and convert back and forth on demand. But I'm
> sure *someone* out there is just dying to write such a class. Why
> prevent them? Not that I'd every USE such a monstrocity, but just don't
> see the ADVANTAGE in providing the programmer with a straightjacket by
> typechecking them (at the language level) to prevent uses outside of
> those envisioned by the language implementers. It sounds decidedly
> non-pythonic to me.

How would it be different from saying that if something is a mapping it
cannot also be a sequence (and vice versa) and trying to distinguish between
the two cases (and, currently, failing for user-coded types because there IS
no way to reliably flag them one way or another)?  The purpose of the
hypothetical abstract basetypes is to let the user optionally flag types in
an unambiguous way.  Types that aren't flagged would presumably keep
muddling through like today, for backwards compatibility.  But allowing the
use of multiple basetypes only seems mean to introduce ambiguity again
and it seems to me that it would have no added value, while providing (at
least) a warning for it would help prevent user mistakes.


Alex


From aleaxit at yahoo.com  Mon Nov  3 11:38:23 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 11:38:36 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031501.hA3F1uH0016389@localhost.localdomain>
References: <200311031501.hA3F1uH0016389@localhost.localdomain>
Message-ID: <200311031738.23373.aleaxit@yahoo.com>

On Monday 03 November 2003 04:01 pm, Anthony Baxter wrote:
> >>> Michael Hudson wrote
> >
> > Well, it's more practice than policy.  I guess the (my...) thinking
> > was that the trunk gets more testing, so it's a proving ground for
> > fixes.
> >
> > It also depends on who's going to be release monkey for the next point
> > release.  The branch is to a certain extent "theirs" and they should
> > get to decide how things work.  I'm not sure who's got the hat at the
> > moment (Anthony?).
>
> Unless someone desperately wants it, I'm happy to keep on doing it. What

And a *big THANKS!* for this -- from us all, I'm sure.

> I'd prefer:
>
>   - Apply to trunk first (assuming, of course, that the patch isn't
> something that's only needed on the branch - at this point in time, I can't
> see that happening, as release23-maint and the trunk haven't diverged far
> enough yet)

No, but there may be some cases.  E.g., one of the doc fix I proposed (but 
didn't commit) is to the reference manual, documenting that list 
comprehensions currently (2.3) "leak" control variables, but code should not
rely on that since it will be fixed in the future.  That doc fix would not 
make much sense in 2.4, assuming the leakage will be fixed then, as
it is currently predicted it will be.

>   - Mark (in checkin message) if the patch is a bugfix candidate
>   - If you're comfortable that the patch is a non-controversial bugfix,
> then commit it to the branch as well, AFTER you have run the unittests on
> the branch to make sure it still works)

[nod] yes -- makes a lot of sense.


> What makes for a controversial vs non-controversial patch? There's a couple
> of things I think are important to bear in mind:
>
>   - Functionality changes are controversial. Unless there's been a
> discussion and agreement (or BDFL fiat <wink>) on python-dev, it shouldn't

Surely the BDFL could afford a better car than _that_?!-)

> go in. - Major changes just near a release are going to be controversial,
> as it makes the life of the release-monkey-of-the-moment more painful.

Good point.

> At the end of the day, if you're not sure your patch should go to the
> branch, then mark it so in the checkin message, and someone (me, mwh,
> someone else willing to look into it) can make a judgment call.

OK.


> On the other hand, no-one's going to jump up and down screaming if you do
> check something in that probably shouldn't have gone in - we can always
> just revert it if necessary. I reserve the right to jump up and down if
> someone checks something in when I'm in the middle of a release and the
> branch is frozen, though <wink>.

Makes sense.


> Also, if you're checking something into the branch, please try and make it
> obvious that the change is a backport or whatever. Something like
> Backport of <trunk checkin message>
> is good.

Unfortunately I didn't do that for my check-ins this weekend ('cause they
weren't backports...:-) but sure, I will try and clarify that in the future.

As soon as I can make time, I'll "forward-port" to the 2.4 trunk the fixes
I had made only to the 2.3 maintenance branch.


Alex


From pje at telecommunity.com  Mon Nov  3 11:38:52 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Nov  3 11:39:03 2003
Subject: [Python-Dev] Looking for master thesis ideas involving
  Python
In-Reply-To: <2mu15l65xy.fsf@starship.python.net>
References: <20031102233516.GA22361@vicky.ecs.soton.ac.uk>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu>
	<2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
Message-ID: <5.1.1.6.0.20031103113450.03470270@telecommunity.com>

At 11:35 AM 11/3/03 +0000, Michael Hudson wrote:
>Armin Rigo <arigo@tunes.org> writes:
>
> > What seems to me like a good solution would be to use one relatively
> > large "arena" per type and Python's memory allocator to subdivide
> > each arena.  If each arena starts at a pointer address which is
> > properly aligned, then *(p&MASK) gives you the type of any object,
> > and possibly even without much cache-miss overhead because there are
> > not so many arenas in total (probably only 1-2 per type in common
> > cases, and arenas can be large).
>
>Hmm, maybe.  I'm not going to make guesses about that one :-)

You guys do realize that this scheme would make it impossible to change an 
object's type, right?  Unless of course you have some way to "search and 
replace" all references to an object.

And if you were to say, "well, we'll only use this trick for non-heap 
types", my question would be, how's the code doing *(p&MASK) going to know 
how *not* to do that?  If heap types have a different layout, how can you 
inherit from a builtin type in pure Python?  And so on.


From pje at telecommunity.com  Mon Nov  3 11:44:12 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Nov  3 11:44:19 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <200311031702.54774.aleaxit@yahoo.com>
References: <1067867712.3fa65e4084e79@mcherm.com>
	<1067867712.3fa65e4084e79@mcherm.com>
Message-ID: <5.1.1.6.0.20031103114125.024e7b10@telecommunity.com>

At 05:02 PM 11/3/03 +0100, Alex Martelli wrote:

>Let me offer just a couple of use cases, one per kind.  For example,
>
>def __mul__(self, other):
>     if isinstance(other, self.KnownNumberTypes):
>         return self.__class__([ x*other for x in self.items ])
>     else:
>         # etc etc, various other multiplication cases
>
>right now, that (class, actually) attribute KnownNumberTypes starts out
>"knowing" about int, long, float, gmpy.mpz, etc, and may require user
>customization (e.g by subclassing) if any other "kind of (scalar) number"
>needs to be supported; besides, the isinstance check must walk linearly
>down the tuple of known number types each time.  (I originally had
>quite a different test structure:
>     try: other + 0
>     except TypeError:  # other is not a number
>         # various other multiplication cases
>     else:
>         # other is a number, so...
>         return self.__class__([ x*other for x in self.items ])
>but the performance for typical benchmarks improved with the isinstance
>test, so, reluctantly, that's what I changed to).  If an abstract basetype
>'basenumber' caught many useful cases, I'd put it right at the start of
>the KnownNumberTypes tuple, omit all subclasses thereof from it, get
>better performance, AND be able to document very simply what the user
>must do to ensure his own custom type is known to me as "a number".

This is the sort of thing that just begs for open generic functions with 
multiple dispatch, though.  Even object adaptation doesn't easily 
generalize to operations better expressed as f(x,y) than x.f(y).


From sjoerd at acm.org  Mon Nov  3 11:50:25 2003
From: sjoerd at acm.org (Sjoerd Mullender)
Date: Mon Nov  3 11:50:40 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031738.23373.aleaxit@yahoo.com>
References: <200311031501.hA3F1uH0016389@localhost.localdomain>
	<200311031738.23373.aleaxit@yahoo.com>
Message-ID: <3FA68751.3060306@acm.org>

Alex Martelli wrote:
> On Monday 03 November 2003 04:01 pm, Anthony Baxter wrote:
>>  - Functionality changes are controversial. Unless there's been a
>>discussion and agreement (or BDFL fiat <wink>) on python-dev, it shouldn't
> 
> 
> Surely the BDFL could afford a better car than _that_?!-)

It *is* the car he used to drive in Amsterdam...

-- 
Sjoerd Mullender <sjoerd@acm.org>

From skip at pobox.com  Mon Nov  3 12:12:23 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Nov  3 12:12:38 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031738.23373.aleaxit@yahoo.com>
References: <200311031501.hA3F1uH0016389@localhost.localdomain>
	<200311031738.23373.aleaxit@yahoo.com>
Message-ID: <16294.35959.394535.409224@montanaro.dyndns.org>


    Alex> No, but there may be some cases.  E.g., one of the doc fix I
    Alex> proposed (but didn't commit) is to the reference manual,
    Alex> documenting that list comprehensions currently (2.3) "leak"
    Alex> control variables, but code should not rely on that since it will
    Alex> be fixed in the future.  That doc fix would not make much sense in
    Alex> 2.4, assuming the leakage will be fixed then, as it is currently
    Alex> predicted it will be.

Sure, but the documentation should reflect the current implementation.  It's
the job of the people who change the list comprehension implementation to
also correct the documentation to be in sync with their changes to the code.

Skip

From guido at python.org  Mon Nov  3 12:43:06 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov  3 12:43:18 2003
Subject: [Python-Dev] reflections on basestring -- and other abstract
	basetypes
In-Reply-To: Your message of "Sun, 02 Nov 2003 23:19:42 +0100."
	<200311022319.42725.aleaxit@yahoo.com> 
References: <200311022319.42725.aleaxit@yahoo.com> 
Message-ID: <200311031743.hA3Hh6O24217@12-236-54-216.client.attbi.com>

> 1. Shouldn't class UserString.UserString inherit from basestring?
>    After all, basestring exists specifically in order to encourage
>    typetests of the form isinstance(x, basestring) -- wouldn't it be
>    better if such tests could also catch "user-tweaked strings"
>    derived from UserString ... ?

I wish I had time for this thread today, but it doesn't look like it.
I just wish to express that we shouldn't lightly mess with this.   I
added basestr specifically to support some code that was interested in
testing whether something was one of the *builtin* string types (or a
subclass thereof).  But I don't recall details and won't be able to
dig them up today.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From mcherm at mcherm.com  Mon Nov  3 13:51:23 2003
From: mcherm at mcherm.com (Michael Chermside)
Date: Mon Nov  3 13:51:29 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
Message-ID: <1067885483.3fa6a3ab94efd@mcherm.com>

I (Michael Chermside) wrote:
> Great idea... I think there should be single type from which all built-in
> integer-like types inherit, and which user-designed types can inherit
> if they want to behave like integers. I think that type should be called
> "int".

Alex replies:
> Unfortunately, unless int is made an abstract type, that doesn't help at
> all to "type-flag" user-coded types (be they C-coded or Python-coded):
> they want to tell "whoever it may concern" that they're intended to be
> usable as integers, but not uselessly carry around an instance of int for
> the purpose (and need to contort their own layout, if C-coded, for that).

Valid point. Of course, we've reduced the use cases to those which want
to emulate integers and ALSO don't want the layout of ints. It seems like
a small number of situations, but there ARE some, and it IS a valid point.

> Abstract basetypes such as basestring are useful only to "flag" types as
> (intending to conform to) some concept: they don't carry implementation.

Well, yes, but Python strives very hard to not NEED to know what type
an object is before operating on it. As long as it supports the operations
that are used, it's "good enough". It's an ideal, not a universal rule,
and there are pleanty of small exceptions, but to introduce a system
of basetypes seems inappropriate.

On the other hand, string and unicode need a common base class because
they are a special case. Really, there are two things going on... the
need to process arbitrary collections of bytes, and the need to
process arbitrary collections of characters. The whole thing is thrown
into confusion because "string" is used for storing characters,
particularly when the characters are expected to be ascii. This is for
historical reasons, performance reasons, out of ignorance, because
"string" is easier to type and u"" is more annoying... lots of reasons
both good and bad. But since lots of string objects contain character
data just like unicode objects, we need a type lable for dealing
with "character data", and that can't be either "unicode" or "string".

I don't see any such issue in numbers (although the int/long flaw
is somewhat similar, but that's being healed).

> Surely you're not claiming that
> Numeric is "abusing operator overloading" by allowing users to code
> a+b, a*b, a-b etc where a and b are multi-dimensional arrays?  The
> ability to use such notation, which is fully natural in the application areas
> those users come from, is important to many users.

Um... no, I didn't mean to claim that. When I wrote it, I was thinking
"okay, you'd only use these operations (sensibly) on something which
had an algebra... ie, a number." But that was wrong... matrices have
an algebra, but they're NOT numbers.

I wrote:
> What use cases do you have for "basenumber" (I don't mean
> examples of classes that would inherit from basenumber, I mean examples
> where that inheritance would make a difference)?

Alex responded with actual examples, and I'll have to take the time
to read them properly before I can respond meaningfully. (But THANKS
for giving specific examples... it always helps me reason about
abstract ideas (like "are baseclasses wise for numbers") when I have
a few concrete examples to check myself against as I go.)

Let this be a warning to me... be careful of getting in an argument
with Alex, since he'll swamp me with far more well-reasoned arguments
and examples than I have time to _read_, much less respond to. <wink>

-- Michael Chermside


From raymond.hettinger at verizon.net  Mon Nov  3 14:00:59 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Mon Nov  3 14:01:43 2003
Subject: [Python-Dev] PEP 322:  Reverse Iteration
Message-ID: <002601c3a23c$e6a1b280$e841fea9@oemcomputer>

The pep has been through several rounds of public comment on
comp.lang.python.  As a result, the proposal has evolved away from
several methods called iter_backwards() and into a simple builtin
function called reversed().  Other simplifications emerged as well.  The
improved pep is at:
 
    www.python.org/sf/pep-0322.html
 
Thanks to many posts by Alex, the only issue of significance is avoiding
having a new builtin. My strong feeling is that the essential simplicity
and utility of the function would be lost if it got tucked away in some
other namespace.  The flipside is our common desire to keep the builtin
namespace as compact as possible.
 
So, I would like to solicit your thoughts and judgments on whether the
PEP merits a new builtin.
 
The proposal and remaining issue are both so simply stated that it was
difficult to keep the newsgroup discussion focused.  The posts
immediately veered towards developing exotic ways to attach the function
to other namespaces.  Instead of repeating that discussion, hopefully we
can just decide whether to accept the pep.
 
Thank you,
 
 
Raymond Hettinger
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031103/52b9b197/attachment.html
From wtrenker at shaw.ca  Mon Nov  3 07:10:10 2003
From: wtrenker at shaw.ca (William Trenker)
Date: Mon Nov  3 14:14:22 2003
Subject: [Python-Dev] new language ideas
In-Reply-To: <1067866557.3fa659bd77a4d@mcherm.com>
References: <1067866557.3fa659bd77a4d@mcherm.com>
Message-ID: <20031103121010.2bbad5e9.wtrenker@shaw.ca>

Michael Chermside wrote:

> Just making __doc__ a dictionary instead of a string doesn't achieve
> anything *unless* there is a fairly standard set of expected keys
> in this dictionary.

Here's a couple of possibilities:
- the Dublin Core (DC), or some sub-set.  DC been quite widely accepted (eg: Zope).
- keys to support version control and CVS integration. (I'm not a CVS expert so this might be off the wall.)

Something like this might be integrated nicely with docutils and other automation tools.

Regards,
Bill

From mwh at python.net  Mon Nov  3 14:59:40 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov  3 14:59:46 2003
Subject: [Python-Dev] Looking for master thesis ideas involving Python
In-Reply-To: <5.1.1.6.0.20031103113450.03470270@telecommunity.com> (Phillip
	J. Eby's message of "Mon, 03 Nov 2003 11:38:52 -0500")
References: <20031102233516.GA22361@vicky.ecs.soton.ac.uk>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<Pine.LNX.4.10.10310291911060.4409-100000@sumeru.stanford.EDU>
	<3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net>
	<3FA1C6CD.6050201@ocf.berkeley.edu>
	<5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com>
	<5.1.1.6.0.20031031111429.03110880@telecommunity.com>
	<2mad7h72sr.fsf@starship.python.net>
	<20031101124628.GA26463@vicky.ecs.soton.ac.uk>
	<m33cd6k2uk.fsf@mira.informatik.hu-berlin.de>
	<20031102233516.GA22361@vicky.ecs.soton.ac.uk>
	<5.1.1.6.0.20031103113450.03470270@telecommunity.com>
Message-ID: <2mism1440j.fsf@starship.python.net>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> At 11:35 AM 11/3/03 +0000, Michael Hudson wrote:
>>Armin Rigo <arigo@tunes.org> writes:
>>
>> > What seems to me like a good solution would be to use one relatively
>> > large "arena" per type and Python's memory allocator to subdivide
>> > each arena.  If each arena starts at a pointer address which is
>> > properly aligned, then *(p&MASK) gives you the type of any object,
>> > and possibly even without much cache-miss overhead because there are
>> > not so many arenas in total (probably only 1-2 per type in common
>> > cases, and arenas can be large).
>>
>>Hmm, maybe.  I'm not going to make guesses about that one :-)
>
> You guys do realize that this scheme would make it impossible to
> change an object's type, right?  Unless of course you have some way to
> "search and replace" all references to an object.

I'd got this far...

> And if you were to say, "well, we'll only use this trick for non-heap
> types", my question would be, how's the code doing *(p&MASK) going to
> know how *not* to do that?  If heap types have a different layout, how
> can you inherit from a builtin type in pure Python?  And so on.

... but somehow this point had escaped me.  Well, you could do
something like having a cookie[1] at the start of heap type pools that
says "the pointer to the type object is actually at *(p-4)" but that's
pretty sick (and puts branches in every type access).  Darn.

Oh well, I suspected my idea had to have some large problem, it just
took longer that I expected for someone to spot it :-)

Cheers,
mwh

[1] e.g. NULL...

-- 
  Presumably pronging in the wrong place zogs it.
                                        -- Aldabra Stoddart, ucam.chat

From aahz at pythoncraft.com  Mon Nov  3 15:15:57 2003
From: aahz at pythoncraft.com (Aahz)
Date: Mon Nov  3 15:16:04 2003
Subject: [Python-Dev] PEP 322:  Reverse Iteration
In-Reply-To: <002601c3a23c$e6a1b280$e841fea9@oemcomputer>
References: <002601c3a23c$e6a1b280$e841fea9@oemcomputer>
Message-ID: <20031103201557.GB2397@panix.com>

On Mon, Nov 03, 2003, Raymond Hettinger wrote:
>
> The pep has been through several rounds of public comment on
> comp.lang.python.  As a result, the proposal has evolved away from
> several methods called iter_backwards() and into a simple builtin
> function called reversed().  Other simplifications emerged as well.  The
> improved pep is at:
>  
>     www.python.org/sf/pep-0322.html
>  
> Thanks to many posts by Alex, the only issue of significance is avoiding
> having a new builtin. My strong feeling is that the essential simplicity
> and utility of the function would be lost if it got tucked away in some
> other namespace.  The flipside is our common desire to keep the builtin
> namespace as compact as possible.

I'm -1 until the PEP includes this issue, then my vote changes to -0.

(I.e., I generally agree with Alex about the builtin issue, but not
strongly enough to actively oppose this PEP as long as it's properly
documented.)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From python at rcn.com  Mon Nov  3 16:20:53 2003
From: python at rcn.com (Raymond Hettinger)
Date: Mon Nov  3 16:21:03 2003
Subject: FW: [Python-Dev] PEP 322:  Reverse Iteration
Message-ID: <001101c3a250$5d35dbc0$e841fea9@oemcomputer>

> > The pep has been through several rounds of public comment on
> > comp.lang.python.  As a result, the proposal has evolved away from
> > several methods called iter_backwards() and into a simple builtin
> > function called reversed().  Other simplifications emerged as well.
The
> > improved pep is at:
> >
> >     www.python.org/sf/pep-0322.html

[Aahz]
> Oops?  That URL don't work.

Drat!

       http://www.python.org/peps/pep-0322.html


Raymond Hettinger


From python at rcn.com  Mon Nov  3 16:33:49 2003
From: python at rcn.com (Raymond Hettinger)
Date: Mon Nov  3 16:33:55 2003
Subject: [Python-Dev] PEP 322:  Reverse Iteration
In-Reply-To: <20031103201557.GB2397@panix.com>
Message-ID: <001201c3a252$2b7965a0$e841fea9@oemcomputer>

[Aahz]
> I'm -1 until the PEP includes this issue, then my vote changes to -0.
> 
> (I.e., I generally agree with Alex about the builtin issue, but not
> strongly enough to actively oppose this PEP as long as it's properly
> documented.)

Okay, added a section to document the chief issue.
BTW, Alex said he was +1 on the idea, but only +0 on it being a builtin.


Raymond Hettinger


#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################

From aleaxit at yahoo.com  Mon Nov  3 17:24:14 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 17:37:30 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <3FA68751.3060306@acm.org>
References: <200311031501.hA3F1uH0016389@localhost.localdomain>
	<200311031738.23373.aleaxit@yahoo.com> <3FA68751.3060306@acm.org>
Message-ID: <200311032324.14570.aleaxit@yahoo.com>

On Monday 03 November 2003 17:50, Sjoerd Mullender wrote:
> Alex Martelli wrote:
> > On Monday 03 November 2003 04:01 pm, Anthony Baxter wrote:
> >>  - Functionality changes are controversial. Unless there's been a
> >>discussion and agreement (or BDFL fiat <wink>) on python-dev, it
> >> shouldn't
> >
> > Surely the BDFL could afford a better car than _that_?!-)
>
> It *is* the car he used to drive in Amsterdam...

Ah, NOW I finally understand the occasional acrimony...!  I will point out 
that although I _am_ Italian, and did use to work for mech CAD giant 
think3, our CAD programs were NOT used by FIAT (by Pininfarina, yes, but 
then their industrial designs are widely used by firms all over the world) 
-- I drive a Honda car, and back when I was a biker I drove a Honda bike 
(and my NON-motor bike is an Atala:-).

Better...?-)


Alex


From aleaxit at yahoo.com  Mon Nov  3 17:37:16 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov  3 17:37:34 2003
Subject: [Python-Dev] PEP 322:  Reverse Iteration
In-Reply-To: <001201c3a252$2b7965a0$e841fea9@oemcomputer>
References: <001201c3a252$2b7965a0$e841fea9@oemcomputer>
Message-ID: <200311032337.16147.aleaxit@yahoo.com>

On Monday 03 November 2003 22:33, Raymond Hettinger wrote:
> [Aahz]
>
> > I'm -1 until the PEP includes this issue, then my vote changes to -0.
> >
> > (I.e., I generally agree with Alex about the builtin issue, but not
> > strongly enough to actively oppose this PEP as long as it's properly
> > documented.)
>
> Okay, added a section to document the chief issue.
> BTW, Alex said he was +1 on the idea, but only +0 on it being a builtin.

Uh, did I?  OK maybe I did.  But what about "revrange" (which I'd LOVE
to incarnate as an iterator-returning irange with an optional reverse=
argument) -- was that knocked out of contention?  I claimed that just
revrange would be too specialized BUT irange would be JUST RIGHT...


Alex


From martin at v.loewis.de  Mon Nov  3 17:43:22 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Mon Nov  3 17:43:37 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031347.10995.aleaxit@yahoo.com>
References: <200311031347.10995.aleaxit@yahoo.com>
Message-ID: <m31xspqdit.fsf@mira.informatik.hu-berlin.de>

Alex Martelli <aleaxit@yahoo.com> writes:

> I made a few bugfix check-ins to the 2.3 maintenance branch this
> weekend and Michael Hudson commented that he thinks that so doing is
> a bad idea, that bug fixes should filter from the 2.4 trunk to the
> 2.3 branch and not the other way around.  Is this indeed the policy
> (have I missed some guidelines about it)?

Atleast that's the policy I was following, indicating backports with
"backported to 2.3" in the checkin message.

> I guess for this round of fixes I will find the time to forward-port them to 
> the 2.4 trunk (in AMPLE time for a 2.4 release -- as 2.3.3 is going to come 
> well before 2.4 releases, the other way 'round wouldn't be quite so sure:-),
> but what about the future?  Should fixes applicable to both 2.3.* and 2.4
> be made [a] always to both trunk and branch, 

I prefer to do them on both the trunk and the branch
simultaneously. Having them on the branch simplifies the life of the
release manager, and having them on the trunk gives them atleast some
testing. I had to back out both patches occasionally, but this is not
a bug problem unless a release of the branch is imminent.

> Oh, incidentally, if it matters -- most were docs issues, including
> as "docs" also some changes to comments that previously were
> misleading or ambiguous.

It does matter. For doc changes, any kind of improvement is acceptable
(IMO), as there is no risk of breaking existing applications.

> I guess that my problem is that I think of 2.3.* fixes as things
> that will be useful to "the general Python-using public" pretty
> soon, with 2.4 far off in the future, so that it appears to me that
> trying to make 2.3.* as well fixed as possible has higher priority.
> But if that conflicts with policy, I will of course change anyway.

If you don't forward-port your changes, nobody will. So you satisfy
the general public now, with a view of taking corrections away from
them in the future.

Regards,
Martin


From martin at v.loewis.de  Mon Nov  3 17:47:12 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Mon Nov  3 17:48:13 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <200311031738.23373.aleaxit@yahoo.com>
References: <200311031501.hA3F1uH0016389@localhost.localdomain>
	<200311031738.23373.aleaxit@yahoo.com>
Message-ID: <m3wuahoyrz.fsf@mira.informatik.hu-berlin.de>

Alex Martelli <aleaxit@yahoo.com> writes:

> No, but there may be some cases.  E.g., one of the doc fix I proposed (but 
> didn't commit) is to the reference manual, documenting that list 
> comprehensions currently (2.3) "leak" control variables, but code should not
> rely on that since it will be fixed in the future.  That doc fix would not 
> make much sense in 2.4, assuming the leakage will be fixed then, as
> it is currently predicted it will be.

Don't trust predictions. If the patch is formally correct now, apply
it now. If you then find it is not needed a week from now, back it
out. Alternatively, put the patch on SF, wait for a week, and then
apply it to branch only.

Regards,
Martin

From guido at python.org  Mon Nov  3 18:26:32 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov  3 18:26:39 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Mon, 03 Nov 2003 23:37:16 +0100."
	<200311032337.16147.aleaxit@yahoo.com> 
References: <001201c3a252$2b7965a0$e841fea9@oemcomputer>  
	<200311032337.16147.aleaxit@yahoo.com> 
Message-ID: <200311032326.hA3NQW124882@12-236-54-216.client.attbi.com>

> > BTW, Alex said he was +1 on the idea, but only +0 on it being a builtin.
> 
> Uh, did I?  OK maybe I did.  But what about "revrange" (which I'd LOVE
> to incarnate as an iterator-returning irange with an optional reverse=
> argument) -- was that knocked out of contention?  I claimed that just
> revrange would be too specialized BUT irange would be JUST RIGHT...

This surprised me a bit too.  The majority of Raymond's examples in
the PEP (when I last saw it a week ago) were reverse numeric ranges,
usually of the form revrange(n) -- which we currently have to spell as
range(n-1, -1, -1) (I think :-) and which the new proposal would turn
into reversed(range(n)).  According to Raymond, a built-in that would
do just that only drew (a small number of) negative responses in the
newsgroup.

Such a thing would face zero opposition if it was part of itertools:
itertools.revrange([start, ] stop[, step]) makes total sense to me...

--Guido van Rossum (home page: http://www.python.org/~guido/)

From tdelaney at avaya.com  Mon Nov  3 19:02:14 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Mon Nov  3 19:02:23 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global.avaya.com>

> From: Alex Martelli [mailto:aleaxit@yahoo.com]
> 
> BTW, when we do come around to PEP 318, I would suggest the 'as'
> clause on a class statement as the best way to specify a metaclass.

I just realised what has been bugging me about the idea of

    def foop() as staticmethod:

and it applies equally well to

    class Newstyle as type:

Basically, it completely changes the semantics associated with 'as' in Python - which are to give something a different name (technically, to rebind the object to a different name).

OTOH, the first case above means 'create this (function) object, call this decorator, and bind the name to the new object'. So instead of taking an existing object (with an existing name) and rebinding it to a new name, it is creating an object, doing something to it and binding it to a name. A definite deviation from the current 'as' semantics, but understandable.

However, the second case above is doing something completely different. It is creating a new object (a class) and binding it to a name. As a side effect, it is changing the metaclass of the object. The 'as' in this case has nothing whatsoever to do with binding the object name, but a name in the object's namespace.

I suppose you could make the argument that the metaclass has to act as a decorator (like in the function def above) and set the __metaclass__ attribute, but that would mean that existing metaclasses couldn't work. It would also mean you were defining the semantics at an implementation level.

I'm worried that I'm being too picky here, because I *like* the way the above reads. I'm just worried about overloading 'as' with too many essentially unrelated meanings.

Tim Delaney

From pje at telecommunity.com  Mon Nov  3 20:09:55 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Nov  3 20:09:00 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global
	.avaya.com>
Message-ID: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com>

At 11:02 AM 11/4/03 +1100, Delaney, Timothy C (Timothy) wrote:
> > From: Alex Martelli [mailto:aleaxit@yahoo.com]
> >
> > BTW, when we do come around to PEP 318, I would suggest the 'as'
> > clause on a class statement as the best way to specify a metaclass.
>
>I just realised what has been bugging me about the idea of
>
>     def foop() as staticmethod:
>
>and it applies equally well to
>
>     class Newstyle as type:
>
>Basically, it completely changes the semantics associated with 'as' in 
>Python - which are to give something a different name (technically, to 
>rebind the object to a different name).
>
>OTOH, the first case above means 'create this (function) object, call this 
>decorator, and bind the name to the new object'. So instead of taking an 
>existing object (with an existing name) and rebinding it to a new name, it 
>is creating an object, doing something to it and binding it to a name. A 
>definite deviation from the current 'as' semantics, but understandable.
>
>However, the second case above is doing something completely different. It 
>is creating a new object (a class) and binding it to a name. As a side 
>effect, it is changing the metaclass of the object. The 'as' in this case 
>has nothing whatsoever to do with binding the object name, but a name in 
>the object's namespace.
>
>I suppose you could make the argument that the metaclass has to act as a 
>decorator (like in the function def above) and set the __metaclass__ 
>attribute, but that would mean that existing metaclasses couldn't work. It 
>would also mean you were defining the semantics at an implementation level.
>
>I'm worried that I'm being too picky here, because I *like* the way the 
>above reads. I'm just worried about overloading 'as' with too many 
>essentially unrelated meanings.

Well, there's always 'is'...

def foop() is staticmethod:

class Newstyle is type:

Interestingly, this usage is rather similar to Eiffel, which IIRC 
introduces code suites with 'is', although I think without the modifier.

I'm not all that enthused about the metaclass usage, mainly because there's 
already an okay syntax (__metaclass__) for it.  I'd rather that class 
decorators (if added) were decorators in the same way as function 
decorators.  Why?  Because I think that correct, combinable class 
decorators are probably easier for most people to write than correct, 
combinable metaclasses, and they are more easily combined than metaclasses are.


From greg at electricrain.com  Mon Nov  3 20:23:10 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Mon Nov  3 20:23:17 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <200311030954.24191.aleaxit@yahoo.com>
References: <200311030848.hA38mItM008890@localhost.localdomain>
	<200311030954.24191.aleaxit@yahoo.com>
Message-ID: <20031104012310.GC17328@zot.electricrain.com>

On Mon, Nov 03, 2003 at 09:54:24AM +0100, Alex Martelli wrote:
> On Monday 03 November 2003 09:48 am, Anthony Baxter wrote:
> > From what I understand, these fixes aren't just fixes to the test suite,
> > but also to fix real problems with the bsddb code itself. In that case,
> > should it be added to the 23 branch? I'd be a solid +1 on this for 2.3.3.
> >
> > Anyone else?
> 
> Anything that makes bsddb less flaky on 2.3.* gets a big hearty enthusiastic 
> +1 from me too.
> 
> Alex

There are no deadlock problems in the current 2.3.2 bsddb module as
it does not have thread support enabled (meaning is likely to crash if
someone uses it from multiple threads at once).

The recent changes to bsddb have been to enable thread support and fix
some singlethreaded deadlocks that thread support introduced due to
the BerkeleyDB's internal locking.

There is still the potential for multithreaded bsddb compatibility
interface use to deadlock.  This bug tracks the issue:

http://sourceforge.net/tracker/?func=detail&aid=834461&group_id=5470&atid=105470

Net effect on release23-branch if we did this today:

  + multithreaded bsddb use now allowed (instead of crashes or corruption)
  - multithreaded bsddb use could deadlock depending on how it is used.
    (anything that creates a cursor internally including many of the
    inherited DictMixin dictionary methods could cause it)


From jeremy at alum.mit.edu  Tue Nov  4 01:10:12 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue Nov  4 01:13:16 2003
Subject: [Python-Dev] XXX undetected error (why=3)
Message-ID: <1067926212.19568.47.camel@localhost.localdomain>

I've been seeing these problems sporadically over the last several
months.  Fred and I tracked one of them down to a bug in pyexpat.c.  I
noticed several more running the Zope 3 test suite today.

I'd like to change the code for the check to call Py_FatalError()
instead of printing a message to stderr.  The check is only enabled
during a debug build.  I'd be much happier debugging this from a core
dump than trying to figure out what happened to cause the message to be
printed.

Any objections?

Jeremy


From aleaxit at yahoo.com  Tue Nov  4 03:12:23 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 03:12:30 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <20031104012310.GC17328@zot.electricrain.com>
References: <200311030848.hA38mItM008890@localhost.localdomain>
	<200311030954.24191.aleaxit@yahoo.com>
	<20031104012310.GC17328@zot.electricrain.com>
Message-ID: <200311040912.23213.aleaxit@yahoo.com>

On Tuesday 04 November 2003 02:23 am, Gregory P. Smith wrote:
   ...
> There are no deadlock problems in the current 2.3.2 bsddb module as
> it does not have thread support enabled (meaning is likely to crash if
> someone uses it from multiple threads at once).

Ah!  Shows you how much I understood of your patch -- I hadn't grasped this!

> Net effect on release23-branch if we did this today:
>
>   + multithreaded bsddb use now allowed (instead of crashes or corruption)

Generally, extending functionality (as opposed to: fixing bugs or clarifying 
docs) is not a goal for 2.3.* -- but I don't know if the fact that bsddb 
isn't thread-safe in 2.3 counts as "a bug", or rather as functionality 
deliberately kept limited, to avoid e.g such bugs as the one you've just 
removed, and other possibilities you mention:

>   - multithreaded bsddb use could deadlock depending on how it is used.

I think that just having the 2.3.* docs explicitly mention the lack of 
thread-safety might then perhaps be better than backporting the changes.


Alex


From aleaxit at yahoo.com  Tue Nov  4 03:24:13 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 03:24:18 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
In-Reply-To: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com>
References: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com>
Message-ID: <200311040924.13894.aleaxit@yahoo.com>

On Tuesday 04 November 2003 02:09 am, Phillip J. Eby wrote:
   ...
> I'm not all that enthused about the metaclass usage, mainly because there's
> already an okay syntax (__metaclass__) for it.  I'd rather that class

Hmmm -- why is:

class Foo:
    __metaclass__ = MetaFoo
    ...

"ok", compared to e.g.:

class Foo is MetaFoo:
    ...

while, again for example,

    def foo():
        ...
     foo = staticmethod(foo)

is presumably deemed "not ok" compared to e.g.:

    def foo() is staticmethod:
        ...

???

Both cases of current syntax do the job (perhaps not elegantly but they do)
and in both cases a new syntax would increase elegance.

> decorators (if added) were decorators in the same way as function
> decorators.  Why?  Because I think that correct, combinable class
> decorators are probably easier for most people to write than correct,
> combinable metaclasses, and they are more easily combined than metaclasses
> are.

Combinable metaclasses may not be trivial to write, but with multiple 
inheritance it will often be feasible (except, presumably, when implied
layout or __new__ have conflicting requirements).  Of course, not having use 
cases of either custom metaclasses or class decorators in production use, the
discussion does risk being a bit abstract.  Did you have any specific use case 
in mind?


Alex


From aleaxit at yahoo.com  Tue Nov  4 03:56:23 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 03:56:29 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global.avaya.com>
Message-ID: <200311040956.23759.aleaxit@yahoo.com>

On Tuesday 04 November 2003 01:02 am, Delaney, Timothy C (Timothy) wrote:
> > From: Alex Martelli [mailto:aleaxit@yahoo.com]
> >
> > BTW, when we do come around to PEP 318, I would suggest the 'as'
> > clause on a class statement as the best way to specify a metaclass.
>
> I just realised what has been bugging me about the idea of
>
>     def foop() as staticmethod:
>
> and it applies equally well to
>
>     class Newstyle as type:
>
> Basically, it completely changes the semantics associated with 'as' in
> Python - which are to give something a different name (technically, to
> rebind the object to a different name).

Yes, that's what the 'as' clause means in from and import statements,
of course.


> OTOH, the first case above means 'create this (function) object, call this
> decorator, and bind the name to the new object'. So instead of taking an
> existing object (with an existing name) and rebinding it to a new name, it
> is creating an object, doing something to it and binding it to a name. A
> definite deviation from the current 'as' semantics, but understandable.

I'm not sure I follow.  "import X as y" means basically
    y = __import__('X')
(give or take a little:-).  'def foo() as staticmethod:' would mean instead
    foo = staticmethod(new.function(<codeobj>, globals(), 'foo'))
so what comes after the 'as' is a name to bind in the existing case, it's
a callable to call in the new proposed syntax.

There is a binding in each case, and in each case something is called
to obtain the object to bind; I think the distinction between new and
existing object is spurious -- __import__ can perfectly well be creating
a new object -- but the real distinction is that the name to bind is given
after 'as' in the existing case, it's NOT so given in the new proposed one.

> However, the second case above is doing something completely different. It

Not at all -- it does:
    Newstyle = type('Newstyle', (), <classdict>)
where <classdict> is built from the body of the 'class' statement, just
like, above, <codeobj> is built from the body of the 'def' statement.

I find this rather close to the 'as staticmethod' case: that one calls
staticmethod (the callable after the 'as') and binds the result to the
name before the 'as', this one calls type (the callable after the 'as')
and binds the result to the name before the 'as'.

> is creating a new object (a class) and binding it to a name. As a side
> effect, it is changing the metaclass of the object. The 'as' in this case

"changing"?  From what?  It's _establishing_ the type of the name it's
binding, just as (e.g.) staticmethod(...) is.  I.e., stripping the syntax we
have in today's Python:

>>> xx = type('xx', (), {'ba':23})
>>> type(xx)
<type 'type'>
>>> xx = staticmethod(lambda ba: 23)
>>> type(xx)
<type 'staticmethod'>

...so where's the "completely different" or the "changing" in one case
and not the other...?

> has nothing whatsoever to do with binding the object name, but a name in
> the object's namespace.

It has everything to do with determining the type of the object, just
like e.g. staticmethod would.

> I suppose you could make the argument that the metaclass has to act as a
> decorator (like in the function def above) and set the __metaclass__
> attribute, but that would mean that existing metaclasses couldn't work. It
> would also mean you were defining the semantics at an implementation level.

I'm sure I've lost you completely here, sorry.

>>> class xx(object): pass
...
>>> xx.__metaclass__
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: type object 'xx' has no attribute '__metaclass__'

why would a class created this way have to set '__metaclass__', again?

A metaclass is the class object's type, and it's called to create the class
object.  If I do "xx = type('xx', (), {})" I get exactly the same result as
with the above "class xx" statement -- no more, no less.  "class" just
gives me neat syntax to determine the 3 arguments with which the
metaclass is called -- a string that's the classname, a tuple of bases,
and a dictionary.

That "__metaclass__ attribute" is just an optional hack which Python
can decide to determine _which_ metaclass to call (in alternative to
others, even today) for a certain 'class' statement.


> I'm worried that I'm being too picky here, because I *like* the way the
> above reads. I'm just worried about overloading 'as' with too many
> essentially unrelated meanings.

I accept that in both 'def foo() as X' and 'class foo as X' the X in "as X"
is very different from its role in 'import foo as X' -- in the import
statement, X is just a name to which to bind an object, while in the
def and class statements X would be a callable to call in order to
get the object -- and the name to bind would be the one right after
the def or class keywords instead.  So maybe we should do as
Phillip Eby suggests and use 'is' instead - that's slightly stretched
too, because after "def foo() is staticmethod:" it would NOT be
the case that 'foo is staticmethod' holds, but, rather, that
isinstance(foo, staticmethod) [so we're saying "IS-A", not really "IS"].

But the def and class statements cases are SO close -- in both what
comes after the 'is' (or 'as') is a callable anyway.  The debate is
then just, should said callable be called with an already prepared
(function or class) object, just to decorate it; or should it rather be
called with the elementary "bricks" needed to build the object, so
it can build it properly.  Incidentally, it seems to me that it might not
be a problem to overload e.g. staticmethod so it can be called with
multiple arguments (same as new.function) and internally calls
new.function itself, should there be any need for that (not that I
can see any use case right now, just musing...).


Alex


From aleaxit at yahoo.com  Tue Nov  4 04:04:38 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 04:04:48 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311032326.hA3NQW124882@12-236-54-216.client.attbi.com>
References: <001201c3a252$2b7965a0$e841fea9@oemcomputer>
	<200311032337.16147.aleaxit@yahoo.com>
	<200311032326.hA3NQW124882@12-236-54-216.client.attbi.com>
Message-ID: <200311041004.38285.aleaxit@yahoo.com>

On Tuesday 04 November 2003 12:26 am, Guido van Rossum wrote:
> > > BTW, Alex said he was +1 on the idea, but only +0 on it being a
> > > builtin.
> >
> > Uh, did I?  OK maybe I did.  But what about "revrange" (which I'd LOVE
> > to incarnate as an iterator-returning irange with an optional reverse=
> > argument) -- was that knocked out of contention?  I claimed that just
> > revrange would be too specialized BUT irange would be JUST RIGHT...
>
> This surprised me a bit too.  The majority of Raymond's examples in
> the PEP (when I last saw it a week ago) were reverse numeric ranges,
> usually of the form revrange(n) -- which we currently have to spell as
> range(n-1, -1, -1) (I think :-) and which the new proposal would turn
> into reversed(range(n)).  According to Raymond, a built-in that would
> do just that only drew (a small number of) negative responses in the
> newsgroup.
>
> Such a thing would face zero opposition if it was part of itertools:
> itertools.revrange([start, ] stop[, step]) makes total sense to me...

And what about irange with an optional reverse= argument?  I did have
(and write about on c.l.py) a case where I currently code:

    if godown:
        iseq = xrange(len(sq)-1, start-1, -1)
    else:
        iseq = xrange(start, len(sq), 1)
    for index in iseq:
        ...

and would be just delighted to be able to code, instead,

    for index in irange(start, len(sq), reverse=godown):
        ...

Even when the need to reverse can more easily be hardwired in
the source (a more common case), would

    for index in irange(start, stop, reverse=True):

be really so much worse than

    for index in revrange(start, stop):

...?


Alex


From aleaxit at yahoo.com  Tue Nov  4 05:02:22 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 05:02:30 2003
Subject: [Python-Dev] reflections on basestring -- and other
	abstractbasetypes
In-Reply-To: <1067885483.3fa6a3ab94efd@mcherm.com>
References: <1067885483.3fa6a3ab94efd@mcherm.com>
Message-ID: <200311041102.22873.aleaxit@yahoo.com>

On Monday 03 November 2003 07:51 pm, Michael Chermside wrote:
> I (Michael Chermside) wrote:
> > Great idea... I think there should be single type from which all built-in
> > integer-like types inherit, and which user-designed types can inherit
> > if they want to behave like integers. I think that type should be called
> > "int".
>
> Alex replies:
> > Unfortunately, unless int is made an abstract type, that doesn't help at
> > all to "type-flag" user-coded types (be they C-coded or Python-coded):
> > they want to tell "whoever it may concern" that they're intended to be
> > usable as integers, but not uselessly carry around an instance of int for
> > the purpose (and need to contort their own layout, if C-coded, for that).
>
> Valid point. Of course, we've reduced the use cases to those which want
> to emulate integers and ALSO don't want the layout of ints. It seems like
> a small number of situations, but there ARE some, and it IS a valid point.

If some code is happy with extending an existing concrete type there is of
course no problem -- it just goes and do it.  Sorry, I was taking that for
granted.  But, e.g., gmpy.mpz wants to keep "the integer" in the form that
makes the underlying GMP library happy, and any similar wrapper over a
library supplying some special implementations of integers (there are quite
a few besides GMP) would be similar in this way.


> > Abstract basetypes such as basestring are useful only to "flag" types as
> > (intending to conform to) some concept: they don't carry implementation.
>
> Well, yes, but Python strives very hard to not NEED to know what type
> an object is before operating on it. As long as it supports the operations
> that are used, it's "good enough". It's an ideal, not a universal rule,
> and there are pleanty of small exceptions, but to introduce a system
> of basetypes seems inappropriate.

It's a wonderful idea, and I generally crusade against typechecking, but I
think there are enough "small exceptions" that some basestring-like abstract
basetypes may be warranted (not necessarily "a system", mind you).

Typechecking against an abstract type is quite different and less of a
problem than doing so against a concrete type, btw -- exactly because
it's not a big problem for a user-coded type to "flag" itself by inheriting
from the abstract basetype in question, if need be... it doesn't carry the
baggage that inheriting from a concrete type does.


> On the other hand, string and unicode need a common base class because
> they are a special case. Really, there are two things going on... the

They're special to Python itself and its standard library because there
is a lot more string-processing and processing of text going on there
than any other kind.  I.e., the usefulness of basestring is more obvious
because Python itself and the standard library are "keen" users of
strings of all kinds;-).

> both good and bad. But since lots of string objects contain character
> data just like unicode objects, we need a type lable for dealing
> with "character data", and that can't be either "unicode" or "string".
>
> I don't see any such issue in numbers (although the int/long flaw
> is somewhat similar, but that's being healed).

But int/long, and float, have enough similarities AND differences too.
Adding a Decimal or a Rational type (I hope both will eventually occur)
will IMHO show that even more clearly.


> > Surely you're not claiming that
> > Numeric is "abusing operator overloading" by allowing users to code
> > a+b, a*b, a-b etc where a and b are multi-dimensional arrays?  The
> > ability to use such notation, which is fully natural in the application
> > areas those users come from, is important to many users.
>
> Um... no, I didn't mean to claim that. When I wrote it, I was thinking
> "okay, you'd only use these operations (sensibly) on something which
> had an algebra... ie, a number." But that was wrong... matrices have
> an algebra, but they're NOT numbers.

Yes, we totally agree on this.


> I wrote:
> > What use cases do you have for "basenumber" (I don't mean
> > examples of classes that would inherit from basenumber, I mean examples
> > where that inheritance would make a difference)?
>
> Alex responded with actual examples, and I'll have to take the time
> to read them properly before I can respond meaningfully. (But THANKS
> for giving specific examples... it always helps me reason about
> abstract ideas (like "are baseclasses wise for numbers") when I have
> a few concrete examples to check myself against as I go.)

I hope I chose the examples well then...;-)


> Let this be a warning to me... be careful of getting in an argument
> with Alex, since he'll swamp me with far more well-reasoned arguments
> and examples than I have time to _read_, much less respond to. <wink>

<wink> indeed...;-).  


Alex


From aleaxit at yahoo.com  Tue Nov  4 05:06:19 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 05:06:24 2003
Subject: [Python-Dev] reflections on basestring -- and other abstract
	basetypes
In-Reply-To: <200311031743.hA3Hh6O24217@12-236-54-216.client.attbi.com>
References: <200311022319.42725.aleaxit@yahoo.com>
	<200311031743.hA3Hh6O24217@12-236-54-216.client.attbi.com>
Message-ID: <200311041106.19798.aleaxit@yahoo.com>

On Monday 03 November 2003 06:43 pm, Guido van Rossum wrote:
> > 1. Shouldn't class UserString.UserString inherit from basestring?
> >    After all, basestring exists specifically in order to encourage
> >    typetests of the form isinstance(x, basestring) -- wouldn't it be
> >    better if such tests could also catch "user-tweaked strings"
> >    derived from UserString ... ?
>
> I wish I had time for this thread today, but it doesn't look like it.
> I just wish to express that we shouldn't lightly mess with this.   I

Aye aye cap'n -- we'll just be squabbling and NOT messing until your
say-so, anyway;-).

> added basestr specifically to support some code that was interested in
> testing whether something was one of the *builtin* string types (or a
> subclass thereof).  But I don't recall details and won't be able to
> dig them up today.

basestring usage has become rather widespread today, anyway; the
specific reason it was introduced is interesting to know, but looking at
how it's used e.g. in the std lib is probably more meaningful.

Of course, we always look at string-ish things with more interest
because we use SO many of them, of all kinds, in Python itself
and its stdlib.  But -- numbers may be very important too, to some
subset of Python's users... _and_ in a secondary sense to Python
itself in some cases.


Alex


From barry at python.org  Tue Nov  4 07:44:12 2003
From: barry at python.org (Barry Warsaw)
Date: Tue Nov  4 07:44:19 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <200311040912.23213.aleaxit@yahoo.com>
References: <200311030848.hA38mItM008890@localhost.localdomain>
	<200311030954.24191.aleaxit@yahoo.com>
	<20031104012310.GC17328@zot.electricrain.com>
	<200311040912.23213.aleaxit@yahoo.com>
Message-ID: <1067949852.26825.3.camel@anthem>

On Tue, 2003-11-04 at 03:12, Alex Martelli wrote:

> Generally, extending functionality (as opposed to: fixing bugs or clarifying 
> docs) is not a goal for 2.3.* -- but I don't know if the fact that bsddb 
> isn't thread-safe in 2.3 counts as "a bug", or rather as functionality 
> deliberately kept limited, to avoid e.g such bugs as the one you've just 
> removed, and other possibilities you mention:
> 
> >   - multithreaded bsddb use could deadlock depending on how it is used.
> 
> I think that just having the 2.3.* docs explicitly mention the lack of 
> thread-safety might then perhaps be better than backporting the changes.

It's just the DB-API that's not thread-safe.  The full blown BerkeleyDB
API (a.k.a. bsddb3) should be fine.

It sure is tempting to claim that the lack of DB-API thread-safety for
BerkeleyDB is a bug and should be fixed for 2.3.*, but I think Greg
should make the final determination.  If it isn't, then yes, the docs
need to clearly state that's the case.

-Barry


From barry at python.org  Tue Nov  4 07:46:40 2003
From: barry at python.org (Barry Warsaw)
Date: Tue Nov  4 07:46:45 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
In-Reply-To: <200311040924.13894.aleaxit@yahoo.com>
References: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com>
	<200311040924.13894.aleaxit@yahoo.com>
Message-ID: <1067949999.26825.6.camel@anthem>

On Tue, 2003-11-04 at 03:24, Alex Martelli wrote:

> class Foo is MetaFoo:

>     def foo() is staticmethod:

My preference would be for metaclass specification to use "is" and for
method decoration to use "as".  They seem like different specializations
that should have a different pronunciation.

-Barry


From anthony at interlink.com.au  Tue Nov  4 07:55:15 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Nov  4 07:59:02 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed 
In-Reply-To: <1067949852.26825.3.camel@anthem> 
Message-ID: <200311041255.hA4CtF1O007177@localhost.localdomain>


>>> Barry Warsaw wrote
> It's just the DB-API that's not thread-safe.  The full blown BerkeleyDB
> API (a.k.a. bsddb3) should be fine.
> 
> It sure is tempting to claim that the lack of DB-API thread-safety for
> BerkeleyDB is a bug and should be fixed for 2.3.*, but I think Greg
> should make the final determination.  If it isn't, then yes, the docs
> need to clearly state that's the case.


At the very least, the test suite should pass on the 23 branch. It currently
hangs or crashes on many/most platforms I've tried it on. If this is because
the test suite is doing multi-threaded things and that Just Won't Work, then
the test suite should be fixed. 

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.

From barry at python.org  Tue Nov  4 08:34:43 2003
From: barry at python.org (Barry Warsaw)
Date: Tue Nov  4 08:34:55 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <200311041255.hA4CtF1O007177@localhost.localdomain>
References: <200311041255.hA4CtF1O007177@localhost.localdomain>
Message-ID: <1067952882.26825.37.camel@anthem>

On Tue, 2003-11-04 at 07:55, Anthony Baxter wrote:

> At the very least, the test suite should pass on the 23 branch. It currently
> hangs or crashes on many/most platforms I've tried it on. If this is because
> the test suite is doing multi-threaded things and that Just Won't Work, then
> the test suite should be fixed. 

Not for me.  Works fine on RH9, except for a crash in test_re.

-Barry

======================================================================
FAIL: test_bug_418626 (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_re.py", line 409, in test_bug_418626
    self.assertRaises(RuntimeError, re.search, '(a|b)*?c', 10000*'ab'+'cd')
  File "/home/barry/projects/python23/Lib/unittest.py", line 295, in failUnlessRaises
    raise self.failureException, excName
AssertionError: RuntimeError

======================================================================
FAIL: test_stack_overflow (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_re.py", line 418, in test_stack_overflow
    self.assertRaises(RuntimeError, re.match, '(x)*', 50000*'x')
  File "/home/barry/projects/python23/Lib/unittest.py", line 295, in failUnlessRaises
    raise self.failureException, excName
AssertionError: RuntimeError

----------------------------------------------------------------------


From aleaxit at yahoo.com  Tue Nov  4 08:38:32 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 08:38:38 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <1067952882.26825.37.camel@anthem>
References: <200311041255.hA4CtF1O007177@localhost.localdomain>
	<1067952882.26825.37.camel@anthem>
Message-ID: <200311041438.32802.aleaxit@yahoo.com>

On Tuesday 04 November 2003 02:34 pm, Barry Warsaw wrote:
> On Tue, 2003-11-04 at 07:55, Anthony Baxter wrote:
> > At the very least, the test suite should pass on the 23 branch. It
> > currently hangs or crashes on many/most platforms I've tried it on. If
> > this is because the test suite is doing multi-threaded things and that
> > Just Won't Work, then the test suite should be fixed.
>
> Not for me.  Works fine on RH9, except for a crash in test_re.

Doesn't look like a crash but rather a failure I already discussed:

> ======================================================================
> FAIL: test_bug_418626 (__main__.ReTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "Lib/test/test_re.py", line 409, in test_bug_418626
>     self.assertRaises(RuntimeError, re.search, '(a|b)*?c', 10000*'ab'+'cd')
>   File "/home/barry/projects/python23/Lib/unittest.py", line 295, in
> failUnlessRaises raise self.failureException, excName
> AssertionError: RuntimeError

The bug that this is testing for has gone away: the re engine doesn't
stack overflow on this any more.  The tests have been updated in 2.4
but not on the 2.3 branch.  I mentioned that and asked whether I should
just update the 2.3 tests, but apparently the concept is that this should
rather be done by whoever fixed the bug, instead (or during the backport
phase to prepare 2.3.3).

Same, apparently, for the other test-failure you mention.


Alex


From mwh at python.net  Tue Nov  4 08:38:54 2003
From: mwh at python.net (Michael Hudson)
Date: Tue Nov  4 08:38:57 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed
In-Reply-To: <1067952882.26825.37.camel@anthem> (Barry Warsaw's message of
	"Tue, 04 Nov 2003 08:34:43 -0500")
References: <200311041255.hA4CtF1O007177@localhost.localdomain>
	<1067952882.26825.37.camel@anthem>
Message-ID: <2mwuag1cep.fsf@starship.python.net>

Barry Warsaw <barry@python.org> writes:

> On Tue, 2003-11-04 at 07:55, Anthony Baxter wrote:
>
>> At the very least, the test suite should pass on the 23 branch. It currently
>> hangs or crashes on many/most platforms I've tried it on. If this is because
>> the test suite is doing multi-threaded things and that Just Won't Work, then
>> the test suite should be fixed. 
>
> Not for me.  Works fine on RH9, except for a crash in test_re.

That's because some naughty person backported the _sre recursion
removal but not the test suite to match.  Oi!

Cheers,
mwh

-- 
  I have gathered a posie of other men's flowers, and nothing but
  the thread that binds them is my own.                   -- Montaigne

From anthony at interlink.com.au  Tue Nov  4 08:43:17 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Nov  4 08:47:07 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed 
In-Reply-To: <200311041438.32802.aleaxit@yahoo.com> 
Message-ID: <200311041343.hA4DhHqm008168@localhost.localdomain>


>>> Alex Martelli wrote
> The bug that this is testing for has gone away: the re engine doesn't
> stack overflow on this any more.  The tests have been updated in 2.4
> but not on the 2.3 branch.  I mentioned that and asked whether I should
> just update the 2.3 tests, but apparently the concept is that this should
> rather be done by whoever fixed the bug, instead (or during the backport
> phase to prepare 2.3.3).

Hm. I must have mis-spoken. If you see a bugfix that should go on the branch
but hasn't, please feel completely free to do the backport. I have a mail
folder with -checkins messages that need to be checked for backportage, but
I only get to this periodically (and not at all in the last couple of weeks,
alas). I do plan to clear this out sometime this week...

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Tue Nov  4 09:01:59 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Nov  4 09:05:46 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch
Message-ID: <200311041402.hA4E20dR015943@localhost.localdomain>

I'm seeing a couple of warnings that I don't remember seeing at
the time of the 2.3.2 release. Given what they are, it's possible
that it's just a random thing (whether the id is < 0 or not).

test_minidom
/home/anthony/src/py/23maint/Lib/xml/dom/minidom.py:797: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up
  return "<DOM Element: %s at %#x>" % (self.tagName, id(self))

test_repr
/home/anthony/src/py/23maint/Lib/test/test_repr.py:91: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up
  eq(r(i3), ("<ClassWithFailingRepr instance at %x>"%id(i3)))

Anyone want to suggest an appropriate fix, or fix them? Otherwise I'll
put it on the to-do list.

Anthony

From anthony at interlink.com.au  Tue Nov  4 09:15:18 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Nov  4 09:19:19 2003
Subject: [Python-Dev] bsddb test case deadlocks fixed 
In-Reply-To: <2mwuag1cep.fsf@starship.python.net> 
Message-ID: <200311041415.hA4EFIwe016470@localhost.localdomain>


>>> Michael Hudson wrote
> That's because some naughty person backported the _sre recursion
> removal but not the test suite to match.  Oi!

Fixed. The test_re_groupref_exists is still disabled on 2.3 branch,
because it still fails on 2.3 <wink>

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From Paul.Moore at atosorigin.com  Tue Nov  4 09:41:11 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Tue Nov  4 09:41:57 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
Message-ID: <16E1010E4581B049ABC51D4975CEDB8802C099A4@UKDCX001.uk.int.atosorigin.com>

From: Guido van Rossum [mailto:guido@python.org]
> Such a thing would face zero opposition if it was part of itertools:
> itertools.revrange([start, ] stop[, step]) makes total sense to me...

I also like Alex's suggestion of
    itertools.irange([start,] stop[, step][,reverse=False])

I'd rather this than a revrange - that feels over-specialised, whereas
an irange with a reverse keyword parameter seems natural.

I'd still support this addition to itertools whether or not the reversed()
builtin was implemented (although with irange, reversed() loses a lot of
its use cases...)

Paul.

From mwh at python.net  Tue Nov  4 10:08:57 2003
From: mwh at python.net (Michael Hudson)
Date: Tue Nov  4 10:09:05 2003
Subject: [Python-Dev] [gmane.comp.sysutils.autotools.announce] Autoconf 2.58
	released
Message-ID: <2msml4188m.fsf@starship.python.net>

We want to be using this asap to get rid of the aclocal hacks, right?

I suppose waiting a *few* days for a brown-paper-bag-release situation
would be prudent.

Cheers,
mwh

-------------- next part --------------
An embedded message was scrubbed...
From: Akim Demaille <akim@epita.fr>
Subject: Autoconf 2.58 released
Date: Tue, 04 Nov 2003 15:57:52 +0100
Size: 6974
Url: http://mail.python.org/pipermail/python-dev/attachments/20031104/e5c3cc5d/attachment.mht
-------------- next part --------------


-- 
  This is the fixed point problem again; since all some implementors
  do is implement the compiler and libraries for compiler writing, the
  language becomes good at writing compilers and not much else!
                                 -- Brian Rogoff, comp.lang.functional
From guido at python.org  Tue Nov  4 10:34:58 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 10:35:06 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 14:41:11 GMT."
	<16E1010E4581B049ABC51D4975CEDB8802C099A4@UKDCX001.uk.int.atosorigin.com>
References: <16E1010E4581B049ABC51D4975CEDB8802C099A4@UKDCX001.uk.int.atosorigin.com>
Message-ID: <200311041534.hA4FYwa26001@12-236-54-216.client.attbi.com>

> From: Guido van Rossum [mailto:guido@python.org]
> > Such a thing would face zero opposition if it was part of itertools:
> > itertools.revrange([start, ] stop[, step]) makes total sense to me...

[Paul Moore]
> I also like Alex's suggestion of
>     itertools.irange([start,] stop[, step][,reverse=False])
> 
> I'd rather this than a revrange - that feels over-specialised, whereas
> an irange with a reverse keyword parameter seems natural.

Hm, I don't know why it feels that way for you.  It would be more
verbose and I expect this option will always be a *constant*.  (One of
my rules-of-thumb for API design is that if you have a Boolean option
whose value is expected to be always a constant, you've really defined
two methods and API-wise you're better off with two separate methods.
Although there are exceptions.)

> I'd still support this addition to itertools whether or not the
> reversed() builtin was implemented (although with irange, reversed()
> loses a lot of its use cases...)

Exactly: I am proposing this *because* it takes care of most of the
use cases for reversed(), and reversed() doesn't need to be a builtin
then.  (If we can live with importing i[rev]range() from itertools, we
can certainly live with importing the more powerful but less
frequently needed reversed() from somewhere.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Nov  4 10:53:08 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 10:53:14 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 10:04:38 +0100."
	<200311041004.38285.aleaxit@yahoo.com> 
References: <001201c3a252$2b7965a0$e841fea9@oemcomputer>
	<200311032337.16147.aleaxit@yahoo.com>
	<200311032326.hA3NQW124882@12-236-54-216.client.attbi.com> 
	<200311041004.38285.aleaxit@yahoo.com> 
Message-ID: <200311041553.hA4Fr8226136@12-236-54-216.client.attbi.com>

> And what about irange with an optional reverse= argument?  I did have
> (and write about on c.l.py) a case where I currently code:
> 
>     if godown:
>         iseq = xrange(len(sq)-1, start-1, -1)
>     else:
>         iseq = xrange(start, len(sq), 1)
>     for index in iseq:
>         ...
> 
> and would be just delighted to be able to code, instead,
> 
>     for index in irange(start, len(sq), reverse=godown):
>         ...
> 
> Even when the need to reverse can more easily be hardwired in
> the source (a more common case), would
> 
>     for index in irange(start, stop, reverse=True):
> 
> be really so much worse than
> 
>     for index in revrange(start, stop):
> 
> ...?

Darn.  At your recommendation I tried reading my inbox in reverse
today (how appropriate... :-) and I missed this use case when I said
I'd rather have two functions.  Oh well.

I do think that the savings in typing from having a reverse= keyword
for that one use case are easily outnumbered by the extra typing for
the much more common use case that has reverse=True.

But really, I could live with either one, so Raymond can decide based
upon the evidence, and as I said, either way having this in itertools
is an argument against making reversed() a builtin.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Nov  4 10:55:32 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 10:55:52 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
In-Reply-To: Your message of "Tue, 04 Nov 2003 09:24:13 +0100."
	<200311040924.13894.aleaxit@yahoo.com> 
References: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com>  
	<200311040924.13894.aleaxit@yahoo.com> 
Message-ID: <200311041555.hA4FtWg26154@12-236-54-216.client.attbi.com>

> Hmmm -- why is:
> 
> class Foo:
>     __metaclass__ = MetaFoo
>     ...
> 
> "ok", compared to e.g.:
> 
> class Foo is MetaFoo:
>     ...
> 
> while, again for example,
> 
>     def foo():
>         ...
>      foo = staticmethod(foo)
> 
> is presumably deemed "not ok" compared to e.g.:
> 
>     def foo() is staticmethod:
>         ...
> 
> ???
> 
> Both cases of current syntax do the job (perhaps not elegantly but
> they do) and in both cases a new syntax would increase elegance.

Perhaps (I haven't really thought this through) because you can place
the __metaclass__ thing right at the top of the class definition,
while the staticmethod thing must necessarily come after the entire
method definition.  Also I expect that __metaclass__ usage is rather
more rare than static or class methods are.  And often one introduces
a metaclass by inheriting from a base class whose sole (or main)
purpose is to change the metaclass -- just like inheriting from object.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Tue Nov  4 11:30:11 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Nov  4 11:30:23 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
Message-ID: <5.1.1.6.0.20031104112105.031b80e0@telecommunity.com>

At 09:24 AM 11/4/03 +0100, Alex Martelli wrote:
>On Tuesday 04 November 2003 02:09 am, Phillip J. Eby wrote:
>    ...
> > I'm not all that enthused about the metaclass usage, mainly because there's
> > already an okay syntax (__metaclass__) for it.  I'd rather that class
>
>Hmmm -- why is:
>
>class Foo:
>     __metaclass__ = MetaFoo
>     ...
>
>"ok", compared to e.g.:
>
>class Foo is MetaFoo:
>     ...
>
>while, again for example,
>
>     def foo():
>         ...
>      foo = staticmethod(foo)
>
>is presumably deemed "not ok" compared to e.g.:
>
>     def foo() is staticmethod:
>         ...
>
>???

Isn't it obvious from the above?  Note the positioning of the '...' in all 
but the third example you've shown.  :)


> > decorators (if added) were decorators in the same way as function
> > decorators.  Why?  Because I think that correct, combinable class
> > decorators are probably easier for most people to write than correct,
> > combinable metaclasses, and they are more easily combined than metaclasses
> > are.
>
>Combinable metaclasses may not be trivial to write, but with multiple
>inheritance it will often be feasible (except, presumably, when implied
>layout or __new__ have conflicting requirements).

I guess my point is that it's harder to *learn* how to write a co-operative 
metaclass, than it is to simply *write* a co-operative decorator.  A 
metaclass must explicitly invoke its collaborators, but a decorator is just 
a simple function; the chaining is external.  Now, certainly you and I both 
know how to write our metaclasses co-operatively, but I believe both of us 
have also been told (repeatedly) that we're not typical Python programmers.  :)


>   Of course, not having use
>cases of either custom metaclasses or class decorators in production use, the
>iscussion does risk being a bit abstract.  Did you have any specific use case
>in mind?


PyProtocols has an API call that "wants" to be a class decorator, to 
declare interface information about the class, e.g:

class MyClass is protocols.instancesProvide(IFoo):
     ....

But, since there are no such things as class decorators, it actually uses a 
sys._getframe() hack to replace the metaclass and simulate 
decoratorness.  Steve Alexander originally proposed the idea as an 
implementation technique for interface declarations in Zope 3, and I worked 
up the actual implementation, that's now shared by PyProtocols and Zope 
3.  So the above is actually rendered now as:

class MyClass:

     protocols.advise(instancesProvide=[IFoo])

(Note that any explicit __metaclass__ declaration has to come *before* the 
advise() call.)  The principal limitation of this technique is that writing 
co-operative decorators of this sort is just as difficult as writing 
co-operative metaclasses.  So, PyProtocols and Zope 3 include a library 
function, 'addClassAdvisor(decorator_callable)' which adds a decorator 
function (in a PEP 218-style execution order) to those that will be called 
on the resulting class.

IOW, we created a decorator mechanism for classes that is almost identical 
to the PEP 218 mechanism for functions, to make it easy to call functions 
on a created class, using declarations that occur near the class 
statement.  This was specifically to make it easier to do simple 
decorator-like things, without writing metaclasses, and thus not 
interfering with user-supplied metaclasses.

Note, by the way, that since you can only have one explicit metaclass, and 
Python does not automatically generate new metaclasses, users must 
explicitly mix metaclasses in order to use them.  That's all well and good 
for gurus such as ourselves, but if you're creating a framework that wants 
to play nicely with other frameworks, and is for non-guru users, then 
metaclasses are right out unless they're the *only* way to achieve the 
desired effect.  For supplying framework metadata, decorators are an 
adequate mechanism that's simpler to implement, and are therefore 
preferable.  Since the 'addClassAdvisor()' mechanism has been available, 
I've used it for other framework metadata annotations, such as security 
restrictions, and to perform miscellaneous other "postprocessing" 
operations on classes.

Now, in the time since Steve Alexander first proposed the idea, I've 
actually grown to like the in-body declaration style for classes, and it's 
possible that PEP 218-style declaration for classes would be more 
unwieldy.  So I'm only +0 on having a class decorator syntax at all.  But I 
do think that if there *is* a class decorator syntax, its semantics should 
exactly match function decorator syntax, and am therefore -1 on it being 
metaclass syntax.  In my experience, non-guru usage of metaclasses is 
usually by inheriting the metaclass from a framework base class, and this 
is the "right way to do it" because the user shouldn't need to know about 
metaclasses unless they are mixing them.  (And if Python mixed them for 
you, there'd be no need for non-gurus to know about metaclasses at all.)


From pje at telecommunity.com  Tue Nov  4 11:35:49 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Nov  4 11:35:57 2003
Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was:
	inlinesort option)
In-Reply-To: <5.1.1.6.0.20031104112105.031b80e0@telecommunity.com>
Message-ID: <5.1.1.6.0.20031104113520.031bb290@telecommunity.com>

At 11:30 AM 11/4/03 -0500, Phillip J. Eby wrote:
>'addClassAdvisor(decorator_callable)' which adds a decorator function (in 
>a PEP 218-style execution order) to those that will be called on the 
>resulting class.
>
>IOW, we created a decorator mechanism for classes that is almost identical 
>to the PEP 218 mechanism for functions, to make it easy to call functions 
>on a created class, using declarations that occur near the class 
>statement.  This was specifically to make it easier to do simple 
>decorator-like things, without writing metaclasses, and thus not 
>interfering with user-supplied metaclasses.

Oops.  I meant PEP 318, obviously.


From tim.one at comcast.net  Tue Nov  4 13:58:51 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Nov  4 13:58:56 2003
Subject: [Python-Dev] XXX undetected error (why=3)
In-Reply-To: <1067926212.19568.47.camel@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPKGOAB.tim.one@comcast.net>

[Jeremy Hylton]
> ...
> I'd like to change the code for the check to call Py_FatalError()
> instead of printing a message to stderr.  The check is only enabled
> during a debug build.  I'd be much happier debugging this from a core
> dump than trying to figure out what happened to cause the message to
> be printed.
>
> Any objections?

+1.  Having catastrophic errors fly by on stderr isn't a good idea even
without the (strong) debuggability argument.


From python at rcn.com  Tue Nov  4 14:50:14 2003
From: python at rcn.com (Raymond Hettinger)
Date: Tue Nov  4 14:50:29 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311041553.hA4Fr8226136@12-236-54-216.client.attbi.com>
Message-ID: <002801c3a30c$def8fae0$6017c797@oemcomputer>

> I do think that the savings in typing from having a reverse= keyword
> for that one use case are easily outnumbered by the extra typing for
> the much more common use case that has reverse=True.
> 
> But really, I could live with either one, so Raymond can decide based
> upon the evidence, and as I said, either way having this in itertools
> is an argument against making reversed() a builtin.

Candidate itertools are expected to accept general iterables as inputs
and to work well with each other.  This function accepts only sequences
as inputs and cannot handle outputs from other itertools.  IOW, it
doesn't belong in the toolset.

As proposed, the reversed() function is much more general than a
backwards xrange.  Handling any sequence is a nice plus and should not
be tossed away.  I would like reversed() to be usable anywhere someone
is tempted to write seq[::-1].

reversed() is a fundamental looping construct.  Tucking it away in
another module in not in harmony with having it readily accessible for
everyday work.  Having dotted access to the function makes its use less
attractive.

My original proposal was to have methods attached to a few sequence
types.  I was deluged with mail pushing toward a more universal builtin
function and that's what is on the table now.  There have been many
notes of support but their voices have been partially drowned by naming
discussions and some weird ideas on places to put it.

I do not support putting it in another namespace, turning it into a
keyword argument, or making it into yet another version of xrange.
What's out there now is simple and direct.  Everyone, please accept it
as is.


Raymond Hettinger


From guido at python.org  Tue Nov  4 15:31:02 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 15:31:36 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 14:50:14 EST."
	<002801c3a30c$def8fae0$6017c797@oemcomputer> 
References: <002801c3a30c$def8fae0$6017c797@oemcomputer> 
Message-ID: <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>

> > I do think that the savings in typing from having a reverse= keyword
> > for that one use case are easily outnumbered by the extra typing for
> > the much more common use case that has reverse=True.
> > 
> > But really, I could live with either one, so Raymond can decide based
> > upon the evidence, and as I said, either way having this in itertools
> > is an argument against making reversed() a builtin.
> 
> Candidate itertools are expected to accept general iterables as inputs
> and to work well with each other.  This function accepts only sequences
> as inputs and cannot handle outputs from other itertools.  IOW, it
> doesn't belong in the toolset.

Ah, you misunderstood.  I was only arguing for irange(...,
reverse=True) or irevrange(...); since irange() is already in
itertools, there can clearly be no objection to adding the reverse
option somehow.  But since (a) at least 60% of the examples are
satisfied with something like irevrange(), and (b) having irevrange()
in itertool is acceptable, my (c) conclusion is that reversed()
doesn't need to be a builtin either.  I didn't say it had to go into
itertools!

> As proposed, the reversed() function is much more general than a
> backwards xrange.  Handling any sequence is a nice plus and should not
> be tossed away.  I would like reversed() to be usable anywhere someone
> is tempted to write seq[::-1].

Sure.  But is this needed often enough to deserve adding a builtin?
If you can prove it would be used as frequently as sum() you'd have a
point.

> reversed() is a fundamental looping construct.  Tucking it away in
> another module in not in harmony with having it readily accessible for
> everyday work.  Having dotted access to the function makes its use less
> attractive.

The same can be said for several functions in itertools...

> My original proposal was to have methods attached to a few sequence
> types.  I was deluged with mail pushing toward a more universal builtin
> function and that's what is on the table now.  There have been many
> notes of support but their voices have been partially drowned by naming
> discussions and some weird ideas on places to put it.
> 
> I do not support putting it in another namespace, turning it into a
> keyword argument, or making it into yet another version of xrange.
> What's out there now is simple and direct.  Everyone, please accept it
> as is.

Sorry, I have to push back on that.  We still need to contain the
growth of the language, and that includes the set of builtins and (to
a lesser extent) the standard library.  You have to show that this is
truly important enough to add to the builtins.  Maybe you can propose
to take away an existing builtin to make room *first*.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From jack at performancedrivers.com  Tue Nov  4 15:33:06 2003
From: jack at performancedrivers.com (Jack Diederich)
Date: Tue Nov  4 15:39:42 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <1067873793.19568.27.camel@localhost.localdomain>;
	from jeremy@alum.mit.edu on Mon, Nov 03, 2003 at 10:36:33AM -0500
References: <200311031347.10995.aleaxit@yahoo.com>
	<1067873793.19568.27.camel@localhost.localdomain>
Message-ID: <20031104153306.E22751@localhost.localdomain>

On Mon, Nov 03, 2003 at 10:36:33AM -0500, Jeremy Hylton wrote:
> On Mon, 2003-11-03 at 07:47, Alex Martelli wrote:
> > I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and 
> > Michael Hudson commented that he thinks that so doing is a bad idea, that bug
> > fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way 
> > around.  Is this indeed the policy (have I missed some guidelines about it)? 
> 
> It is customary to fix things on the trunk first, then backport to
> branches where it is needed.  People who maintain branches often watch
> the trunk to look for things that need to be backported.  As far as I
> know, no one watches the branches to look for things to port to the
> trunk.  It may get lost if it's only on a branch.
> 
> The best thing to do is your option [a]: Fix it in both places at once. 
> Then there's nothing to be forgotten when time for a release rolls
> around.
> 

If we aren't using CVS tagging features, it just falls under personal
preference.  If we are, it is easier to import all the changes from
the branch to the trunk, tag it is 'import_to_trunk_N' and then
next time something changes just look at the diff between the
'import_to_trunk_N' tag to now, mark as 'import_to_trunk_N+1', rinse
and repeat.  Doing it w/ tags has the benefit that you can do
a one-liner that says 'try to import any changes from the branch.'

-jackdied

From aleaxit at yahoo.com  Tue Nov  4 15:50:32 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 15:50:38 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch
In-Reply-To: <200311041402.hA4E20dR015943@localhost.localdomain>
References: <200311041402.hA4E20dR015943@localhost.localdomain>
Message-ID: <200311042150.32358.aleaxit@yahoo.com>

On Tuesday 04 November 2003 03:01 pm, Anthony Baxter wrote:
> I'm seeing a couple of warnings that I don't remember seeing at
> the time of the 2.3.2 release. Given what they are, it's possible
> that it's just a random thing (whether the id is < 0 or not).
>
> test_minidom
> /home/anthony/src/py/23maint/Lib/xml/dom/minidom.py:797: FutureWarning:
> %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and
> up return "<DOM Element: %s at %#x>" % (self.tagName, id(self))
>
> test_repr
> /home/anthony/src/py/23maint/Lib/test/test_repr.py:91: FutureWarning:
> %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and
> up eq(r(i3), ("<ClassWithFailingRepr instance at %x>"%id(i3)))
>
> Anyone want to suggest an appropriate fix, or fix them? Otherwise I'll
> put it on the to-do list.

Not sure if it's "appropriate", but what other tests appear to be doing is
to explicitly mark warnings (& specifically this one) as ignored:

regrtest.py:# I see no other way to suppress these warnings;
regrtest.py:warnings.filterwarnings("ignore", "hex/oct constants", 
FutureWarning,
regrtest.py:    warnings.filterwarnings("ignore", "hex/oct constants", 
FutureWarning,

test_builtin.py:warnings.filterwarnings("ignore", "hex../oct.. of negative 
int",
test_builtin.py:                        FutureWarning, __name__)

test_compile.py:        warnings.filterwarnings("ignore", "hex/oct constants", 
FutureWarning)
test_compile.py:        warnings.filterwarnings("ignore", "hex.* of negative 
int", FutureWarning)

test_hexoct.py:warnings.filterwarnings("ignore", "hex/oct constants", 
FutureWarning,


Alex


From pf_moore at yahoo.co.uk  Tue Nov  4 16:00:40 2003
From: pf_moore at yahoo.co.uk (Paul Moore)
Date: Tue Nov  4 16:02:11 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
Message-ID: <wuaf4znr.fsf@yahoo.co.uk>

Guido van Rossum <guido@python.org> writes:

>> Candidate itertools are expected to accept general iterables as inputs
>> and to work well with each other.  This function accepts only sequences
>> as inputs and cannot handle outputs from other itertools.  IOW, it
>> doesn't belong in the toolset.
>
> Ah, you misunderstood.  I was only arguing for irange(...,
> reverse=True) or irevrange(...); since irange() is already in
> itertools, there can clearly be no objection to adding the reverse
> option somehow.

Actually, irange() is not in itertools at the moment. Raymond could
argue that irange() isn't a suitable candidate for itertools, but given
the existence of count() and repeat(), I suspect that isn't a
particularly convincing argument.

Arguing that irange() is too similar to range() and xrange() is
closer, but I'd say that irange is the *right* way to do it. [x]range
should be relegated to backward-compatibility tools, much like the
file xreadlines() method and the xreadlines module.

Raymond - are you dead set against an irange() function in itertools?
Assume for now that it's a simple version without a reverse argument.

> But since (a) at least 60% of the examples are satisfied with
> something like irevrange(), and (b) having irevrange() in itertool
> is acceptable, my (c) conclusion is that reversed() doesn't need to
> be a builtin either. I didn't say it had to go into itertools!

Raymond seems very protective of the concept of reversed() as a
builtin. I'm not saying that's wrong, but I *personally* haven't seen
enough evidence yet to be convinced either way. The i{rev}range()
issues seem to be getting caught up in this.

My view:

1. I think a "plain" irange() would be useful to add into itertools.
   In the (very) long term, it could replace [x]range, but that's less
   of an issue to me.
2. A way of getting a reversed {i,x}range() has some clear use cases. 
   This seems useful to add (although here, I'm going on evidence of
   others' code - in my code I tend to loop over containers much more
   often than over ranges of numbers).
3. A general reversed() function seems theoretically useful, but the
   concrete use cases seem fairly thin on the ground. I'm broadly in
   favour, because I (possibly like Raymond) have a bias for clean,
   general solutions. But I can see that "practicality beats purity"
   may hold here.

My proposals:

1. Add a plain irange() to itertools.
2. IF the general reversed() is deemed too theoretical, add EITHER a
   reverse argument to irange, or an irevrange to itertools. Both feel
   to me a little iffy, but that's my generality bias again.
3. IF the general reversed() is accepted (builtin or not) leave the
   irange function in its simple form.

> Sorry, I have to push back on that.  We still need to contain the
> growth of the language, and that includes the set of builtins and (to
> a lesser extent) the standard library.  You have to show that this is
> truly important enough to add to the builtins.  Maybe you can propose
> to take away an existing builtin to make room *first*.

xrange (in favour of itertools.irange())? :-)

[Personally, I'm still not 100% sure I see Raymond's strong reluctance
to have reversed() in itertools, but as both are his babies, and he
clearly has a very definite vision for both, I don't feel that I want
to argue this one with him].

Paul.
-- 
This signature intentionally left blank


From neal at metaslash.com  Tue Nov  4 16:11:19 2003
From: neal at metaslash.com (Neal Norwitz)
Date: Tue Nov  4 16:11:27 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
Message-ID: <20031104211119.GS7212@epoch.metaslash.com>

On Tue, Nov 04, 2003 at 12:31:02PM -0800, Guido van Rossum wrote:
> 
> We still need to contain the growth of the language, and that
> includes the set of builtins and (to a lesser extent) the standard
> library.  Maybe you can propose to take away an existing builtin to
> make room *first*.

Oh boy!  You opened a can of worms. :-)

I won't suggest adding any builtins (including reverse), but I will
suggest (re)moving quite a few.  This is a suggestion towards the
future.  I realize nothing should be removed in 2.4.

Currently, we have
        >>> len(filter(lambda x: x[0].islower(), dir(__builtins__)))
        66

Below are the builtins I'd like to see removed.  I've given a short cryptic
comment for many.  Several can be removed because they become redundant
(e.g., long, open, raw_input, xrange).

        apply, 
        buffer (or replace implementation with something useful), 
        coerce, intern,
        long (int/long unification),
        open (same as file) (maybe pending deprecation in 2.4?), 
        raw_input (become input),
        reduce (assuming another mechanism will exist) (deprecate 2.4?),
        reload (some other mechanism related to import?),
        slice (or maybe move to a module),
        xrange (unify with range)

For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
Pending deprecation for: open, reduce, and maybe slice.

I don't know how to deal with input/raw_input.  While it seems goofy,
perhaps something like this:
        2.4 deprecate input
        2.5 make input == raw_input, pending deprecation for raw_input
        2.6 deprecate raw_input
        2.7 remove raw_input

Or just wait for 3.0. :-)

Math related builtins: abs, complex, divmod, pow, round, sum
        Perhaps, some of these could be moved to a module

Move these to sys?:   hash, id
Formating utilities (move some/all to some module): 
                       chr, hex, oct, ord, repr, unichr
        Not sure about these, chr, oct and unichr seem
        to be the least used in my code.

For any builtin that's moved, make a pending deprecation
when used as a builtin for 2.4 and full deprecation for 2.5.

For anything that is likely to be (re)moved in 3.0, perhaps
we should at least use pending deprecations now.  Even
if we don't know what will happen, at least people start
getting an idea of the direction for the future.

Doing-my-best-to-shrink-the-language, :-)
Neal

From aleaxit at yahoo.com  Tue Nov  4 16:47:45 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 16:47:52 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <wuaf4znr.fsf@yahoo.co.uk>
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
	<wuaf4znr.fsf@yahoo.co.uk>
Message-ID: <200311042247.45386.aleaxit@yahoo.com>

On Tuesday 04 November 2003 10:00 pm, Paul Moore wrote:
   ...
> Arguing that irange() is too similar to range() and xrange() is
> closer, but I'd say that irange is the *right* way to do it. [x]range

Agreed.  The reverse= optional argument would be delightful gravy
for me, but I could do without and not suffer too badly IF reversed
was available: where I now have

     if godown:
         iseq = xrange(len(sq)-1, start-1, -1)
     else:
         iseq = xrange(start, len(sq), 1)
     for index in iseq:
         ...

and dream of compacting it all the way to:

     for index in irange(start, len(sq), reverse=godown):
         ...

I could do something like:

     iseq = irange(start, len(sq))
     if godown: iseq = reversed(iseq)
     for index in iseq:
         ...

And after all I have found only that one use case where I need to
loop either forward or backward in my Python code.  I expected more
because I remember how horribly constraining Pascal's strong separation
between iteration forwards and backwards (for i := a to b do ...  vs
for i :- b downto a do...) felt compared to being able to just code
"DO 10 I = IST, ITO, IDELTA" in Fortran -- lo that many years ago.
I guess I write different kinds of programs these days.

> should be relegated to backward-compatibility tools, much like the
> file xreadlines() method and the xreadlines module.

Seconded.

> Raymond - are you dead set against an irange() function in itertools?
> Assume for now that it's a simple version without a reverse argument.

...and that we ALSO get your cherished reversed built-in -- there is most
emphatically no mutual incompatibility between them...


> Raymond seems very protective of the concept of reversed() as a
> builtin. I'm not saying that's wrong, but I *personally* haven't seen
> enough evidence yet to be convinced either way. The i{rev}range()

I'm slowly coming to accept it -- it's sure way more appropriate as a
built-in than many that currently crowd the builtins namespace.

> issues seem to be getting caught up in this.
>
> My view:
>
> 1. I think a "plain" irange() would be useful to add into itertools.

Yes!

>    In the (very) long term, it could replace [x]range, but that's less
>    of an issue to me.

It's probably more important to me, I guess.

> 2. A way of getting a reversed {i,x}range() has some clear use cases.
>    This seems useful to add (although here, I'm going on evidence of
>    others' code - in my code I tend to loop over containers much more
>    often than over ranges of numbers).

Me too -- by now I've replaced basically all the old
    for i in xrange(len(seq)):
        value = seq[i]
        ...
into shiny new
    for i, value in enumerate(seq):
        ...

Admittedly some reverse iterations are like that -- and being able to code

    for i in revrange(len(seq)):
        value = seq[i]
        ...

while better than (eek) "for i in xrange(len(seq)-1, -1, -1)):", still is NOT
quite as smooth as

    for value in reversed(seq):
        ...

or reversed(enumerate(seq)) if the index IS needed.

BTW, I do have spots where:

    seq.reverse()

    try: 
        #...region of code where seq is used reversed...
    finally: seq.reverse()    # put it back rightside-up again

(including one where I had forgotten the try/finally -- as I found out while
looking for these cases...).  Of course this only works because seq is a
list and it might have all sort of downsides (e.g. if this was multithreaded
code, which it isn't, it might interfere with other uses of seq; if seq was a
global this function couldn't be recursive any more; ...).

All in all I think those would benefit from reversed(seq) even if it has to
be called in more spots within the "region of code".


> 3. A general reversed() function seems theoretically useful, but the
>    concrete use cases seem fairly thin on the ground. I'm broadly in
>    favour, because I (possibly like Raymond) have a bias for clean,
>    general solutions. But I can see that "practicality beats purity"
>    may hold here.

Funny, I originally felt queasy (about it being a built-in only) for "purity"
about the overcrowded builtins namespace.  I'm seeing enough use
cases (even if irange DID grow a wonderful reverse= optional arg...)
that practicality is gradually winning me over.  I.e., practicality beats
purity is what is winning me over, while to you it suggests dampening
your "broadly in favour"...  we both mention iterating over sequences
more than over indices, but to me that's a suggestion that reversed
has a place, while you don't seem to think that follows...


> My proposals:
>
> 1. Add a plain irange() to itertools.

Yes!!!

> 2. IF the general reversed() is deemed too theoretical, add EITHER a
>    reverse argument to irange, or an irevrange to itertools. Both feel
>    to me a little iffy, but that's my generality bias again.
> 3. IF the general reversed() is accepted (builtin or not) leave the
>    irange function in its simple form.

Sigh, OK, I guess.  If I had to choose, reversed + irange plain only,
or no reversed + irange w/optional argument, I guess I would grudgingly
choose the former (having shifted my opinion).  But I'd really like BOTH
reversed (lets me iterate on sequence rather than on indices, often)
AND irange with optional reversed= ... no irevrange please...


> > Sorry, I have to push back on that.  We still need to contain the
> > growth of the language, and that includes the set of builtins and (to
> > a lesser extent) the standard library.  You have to show that this is
> > truly important enough to add to the builtins.  Maybe you can propose
> > to take away an existing builtin to make room *first*.
>
> xrange (in favour of itertools.irange())? :-)

Seconded.  Neal Norwitz' "little list" has plenty more useful suggestions,
though I wouldn't accept it as entirely sound.


> [Personally, I'm still not 100% sure I see Raymond's strong reluctance
> to have reversed() in itertools, but as both are his babies, and he

Actually, I do: itertools shouldn't be limited to accepting sequences,
they should accept iterator arguments.

> clearly has a very definite vision for both, I don't feel that I want
> to argue this one with him].

You have a point -- Raymond definitely HAS an overall vision on
iterators &c and he's deserved lots of listening-to even though we
can't quite see some specific point.


Alex


From martin at v.loewis.de  Tue Nov  4 16:51:49 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Tue Nov  4 16:52:20 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch
In-Reply-To: <200311041402.hA4E20dR015943@localhost.localdomain>
References: <200311041402.hA4E20dR015943@localhost.localdomain>
Message-ID: <m3oevrer9m.fsf@mira.informatik.hu-berlin.de>

Anthony Baxter <anthony@interlink.com.au> writes:

> I'm seeing a couple of warnings that I don't remember seeing at
> the time of the 2.3.2 release. Given what they are, it's possible
> that it's just a random thing (whether the id is < 0 or not).

What system is this on? I find it surprising that the id is < 0: on a
32-bit machine, this should only happen if you allocate more than 2GB.

> Anyone want to suggest an appropriate fix, or fix them? Otherwise I'll
> put it on the to-do list.

I'd reformulate them as 

"%x" % (id(o) & 0xffffffffL)

Of course, you have to replace 0xffffffffL with (unsigned)-1 of the
system (i.e. 2l*sys.maxint+1). I wonder whether creating a function

sys.unsigned(id(o))

would be appropriate, which returns its arguments for positive
numbers, and PyLong_FromUnsignedLong((unsigned)arg) otherwise.

Regards,
Martin


From martin at v.loewis.de  Tue Nov  4 16:57:10 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Tue Nov  4 16:57:24 2003
Subject: [Python-Dev] Autoconf 2.58 released
In-Reply-To: <2msml4188m.fsf@starship.python.net>
References: <2msml4188m.fsf@starship.python.net>
Message-ID: <m3k76fer0p.fsf@mira.informatik.hu-berlin.de>

Michael Hudson <mwh@python.net> writes:

> We want to be using this asap to get rid of the aclocal hacks, right?

Sounds good to me. If you volunteer, please feel free to update
AC_PREREQ when you consider it appropriate.

We need to consider whether the bump the autoconf version used on the
2.3 branch, or whether developers would be required to use to autoconf
2.5x releases.

Regards,
Martin

From tdelaney at avaya.com  Tue Nov  4 16:57:43 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Tue Nov  4 16:57:50 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5DBE@au3010avexu1.global.avaya.com>

> From: Alex Martelli [mailto:aleaxit@yahoo.com]
> 
> or reversed(enumerate(seq)) if the index IS needed.

Hmm - wouldn't this give an iterator that returned two values - an iterable for the seq, and an iterable for the indexes of seq?

I would think this would need to be:

    reversed(*enumerate(seq))

with the presumption being that reversed would reverse each parameter and return them in lockstep.

Tim Delaney

From jeremy at zope.com  Tue Nov  4 16:59:19 2003
From: jeremy at zope.com (Jeremy Hylton)
Date: Tue Nov  4 17:03:00 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <20031104153306.E22751@localhost.localdomain>
References: <200311031347.10995.aleaxit@yahoo.com>
	<1067873793.19568.27.camel@localhost.localdomain>
	<20031104153306.E22751@localhost.localdomain>
Message-ID: <1067983159.19568.64.camel@localhost.localdomain>

On Tue, 2003-11-04 at 15:33, Jack Diederich wrote:
> On Mon, Nov 03, 2003 at 10:36:33AM -0500, Jeremy Hylton wrote:
> > On Mon, 2003-11-03 at 07:47, Alex Martelli wrote:
> > > I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and 
> > > Michael Hudson commented that he thinks that so doing is a bad idea, that bug
> > > fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way 
> > > around.  Is this indeed the policy (have I missed some guidelines about it)? 
> > 
> > It is customary to fix things on the trunk first, then backport to
> > branches where it is needed.  People who maintain branches often watch
> > the trunk to look for things that need to be backported.  As far as I
> > know, no one watches the branches to look for things to port to the
> > trunk.  It may get lost if it's only on a branch.
> > 
> > The best thing to do is your option [a]: Fix it in both places at once. 
> > Then there's nothing to be forgotten when time for a release rolls
> > around.
> > 
> 
> If we aren't using CVS tagging features, it just falls under personal
> preference.

I think there's more than personal preference involved.  We ought to be
consistent in how we apply patches to avoid missing things.

>   If we are, it is easier to import all the changes from
> the branch to the trunk, tag it is 'import_to_trunk_N' and then
> next time something changes just look at the diff between the
> 'import_to_trunk_N' tag to now, mark as 'import_to_trunk_N+1', rinse
> and repeat.  Doing it w/ tags has the benefit that you can do
> a one-liner that says 'try to import any changes from the branch.'

The branch has bug fixes and changes that don't necessarily show up on
the trunk.  For example, a bug that exists in code that was removed or
completely rewritten on the trunk.  It also doesn't address the
stability issue: A maintenance branch gets less testing, and committers
should be cautious about changes.  Committing on the trunk first gives
you a chance to test out the changes there and get feedback.

Jeremy


From aleaxit at yahoo.com  Tue Nov  4 17:04:00 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 17:04:05 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
Message-ID: <200311042304.00006.aleaxit@yahoo.com>

On Tuesday 04 November 2003 09:31 pm, Guido van Rossum wrote:
   ...
> option somehow.  But since (a) at least 60% of the examples are
> satisfied with something like irevrange(), and (b) having irevrange()

I'm not sure it's as high as that, depending on how strictly one wants
to define "satisfied".

Say I want to (do some change, say call f() on each item) the prefix of a 
list of numbers, stopping at the first zero I meet.  In old Python:

    for i in xrange(len(listofnum)):
        value = listofnum[i]
        if not value: break
        listofnum[i] = f(value)

but today I'd rather code this:

    for i, value in enumerate(listofnum):
        if not value: break
        listofnum[i] = f(value)

more concise and neat.

So, what if I want to do it to the _suffix_ , the tail, of the list, stopping
at the first zero I meet going backwards?  W/o irevrange, eek:

    for i in xrange(-1, -len(listofnum)-1, -1):
        # usual 3-line body

or equivalently

    for i in xrange(len(listofnum)-1, -1, -1):
        # usual 3-line body

but irevrange would only fix the for clause itself:

    for i in irevrange(len(listofnum)):
        # usual 3-line body

the body remains stuck at the old-python 3-liner.  reversed does better:

    for i, value in reversed(enumerate(listofnum)):
        if not value: break
        listofnum[i] = f(value)

i.e. it lets me use the "modern" Python idiom.

If you consider thise case "satisfied" by irevrange then maybe 60% is
roughly right.  But it seems to me that only reversed "satisfies" it fully.


> > be tossed away.  I would like reversed() to be usable anywhere someone
> > is tempted to write seq[::-1].
>
> Sure.  But is this needed often enough to deserve adding a builtin?

I used to think it didn't, but the more at looked at code with this in mind,
the more I'm convincing myself otherwise.

> If you can prove it would be used as frequently as sum() you'd have a
> point.

No, not as frequently as sum, but then this applies to many other builtins.


> > reversed() is a fundamental looping construct.  Tucking it away in
> > another module in not in harmony with having it readily accessible for
> > everyday work.  Having dotted access to the function makes its use less
> > attractive.
>
> The same can be said for several functions in itertools...

True, but adding ONE builtin is not like adding half a dozen.


> > What's out there now is simple and direct.  Everyone, please accept it
> > as is.
>
> Sorry, I have to push back on that.  We still need to contain the
> growth of the language, and that includes the set of builtins and (to
> a lesser extent) the standard library.  You have to show that this is
> truly important enough to add to the builtins.  Maybe you can propose
> to take away an existing builtin to make room *first*.

I don't know if Raymond has responded to this specific request, but
I've seen other responses and I entirely concur that LOTS of existing
built-ins -- such as apply, coerce, filter, input, intern, oct, round --
could be usefully be deprecated/removed/moved elsewhere (e.g.
to a new "legacy.py" module of short one-liners for apply, filter, ... --
to math for round, oct ... -- legacy could also 'from math import'
the latter names, so that "from legacy import *" would make old 
modules keep working///).


Alex


From aleaxit at yahoo.com  Tue Nov  4 17:09:24 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov  4 17:09:29 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5DBE@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5DBE@au3010avexu1.global.avaya.com>
Message-ID: <200311042309.24836.aleaxit@yahoo.com>

On Tuesday 04 November 2003 10:57 pm, Delaney, Timothy C (Timothy) wrote:
> > From: Alex Martelli [mailto:aleaxit@yahoo.com]
> >
> > or reversed(enumerate(seq)) if the index IS needed.
>
> Hmm - wouldn't this give an iterator that returned two values - an iterable
> for the seq, and an iterable for the indexes of seq?

I must be missing something.  enumerate(x) is an iterator with len(x)
values, each a pair; why would reversing it somehow "transpose" it...?


> I would think this would need to be:
>
>     reversed(*enumerate(seq))
>
> with the presumption being that reversed would reverse each parameter and
> return them in lockstep.

I'm not sure if reversed should take several parameters, but it if did this
would be like calling:
    reversed( (0, x[0]), (1,x[1]), (2,x[2]) )
If it "reversed each parameter and returned them in lockstep" then I'd
have x first and (0,1,2) second, no?


Alex


From guido at python.org  Tue Nov  4 18:27:26 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 18:27:38 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 21:00:40 GMT."
	<wuaf4znr.fsf@yahoo.co.uk> 
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> 
	<wuaf4znr.fsf@yahoo.co.uk> 
Message-ID: <200311042327.hA4NRQD27067@12-236-54-216.client.attbi.com>

> Arguing that irange() is too similar to range() and xrange() is
> closer, but I'd say that irange is the *right* way to do it. [x]range
> should be relegated to backward-compatibility tools, much like the
> file xreadlines() method and the xreadlines module.

Hm.  There's a usage pattern that seems easy with [x]range() but not
so easy with irange():

    R = xrange(...)
    for x in R: ...
    for y in R: ...

IMO, being able to say "for x in R" rather than having to remember the
arguments to irange() and having to say "for x in irange(a, b, c)" is
a big and useful advantage.

IOW [x]range() returns a *sequence* which is more powerful than an
iterator, because it can be iterated more than once.

Now, the same could be accomplished with copyable iterators, but it
is still more work:

   I = irange(...)
   R1, R1 = tee(I)
   for x in R1: ...
   for x in R2: ...

> Raymond - are you dead set against an irange() function in itertools?
> Assume for now that it's a simple version without a reverse argument.
> 
> > But since (a) at least 60% of the examples are satisfied with
> > something like irevrange(), and (b) having irevrange() in itertool
> > is acceptable, my (c) conclusion is that reversed() doesn't need to
> > be a builtin either. I didn't say it had to go into itertools!
> 
> Raymond seems very protective of the concept of reversed() as a
> builtin. I'm not saying that's wrong, but I *personally* haven't seen
> enough evidence yet to be convinced either way. The i{rev}range()
> issues seem to be getting caught up in this.

Right.

> My view:
> 
> 1. I think a "plain" irange() would be useful to add into itertools.
>    In the (very) long term, it could replace [x]range, but that's less
>    of an issue to me.
> 2. A way of getting a reversed {i,x}range() has some clear use cases. 
>    This seems useful to add (although here, I'm going on evidence of
>    others' code - in my code I tend to loop over containers much more
>    often than over ranges of numbers).
> 3. A general reversed() function seems theoretically useful, but the
>    concrete use cases seem fairly thin on the ground. I'm broadly in
>    favour, because I (possibly like Raymond) have a bias for clean,
>    general solutions. But I can see that "practicality beats purity"
>    may hold here.
> 
> My proposals:
> 
> 1. Add a plain irange() to itertools.
> 2. IF the general reversed() is deemed too theoretical, add EITHER a
>    reverse argument to irange, or an irevrange to itertools. Both feel
>    to me a little iffy, but that's my generality bias again.
> 3. IF the general reversed() is accepted (builtin or not) leave the
>    irange function in its simple form.

Hm.  reversed(irange(...)) can't work, so you'd have to have both.

> > Sorry, I have to push back on that.  We still need to contain the
> > growth of the language, and that includes the set of builtins and (to
> > a lesser extent) the standard library.  You have to show that this is
> > truly important enough to add to the builtins.  Maybe you can propose
> > to take away an existing builtin to make room *first*.
> 
> xrange (in favour of itertools.irange())? :-)
> 
> [Personally, I'm still not 100% sure I see Raymond's strong reluctance
> to have reversed() in itertools, but as both are his babies, and he
> clearly has a very definite vision for both, I don't feel that I want
> to argue this one with him].

That part I understand.  reversed() is a function of a sequence
(something with __len__ and __getitem__ methods), not of an iterator,
and as such it doesn't belong in itertools.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Nov  4 18:29:06 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 18:29:35 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 22:47:45 +0100."
	<200311042247.45386.aleaxit@yahoo.com> 
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com>
	<wuaf4znr.fsf@yahoo.co.uk> <200311042247.45386.aleaxit@yahoo.com> 
Message-ID: <200311042329.hA4NT7727092@12-236-54-216.client.attbi.com>

>      iseq = irange(start, len(sq))
>      if godown: iseq = reversed(iseq)

But this wouldn't work, would it?  irange() is an iterator, but
reversed() only works for sequences (it refuses to secretly buffer the
whole thing).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Tue Nov  4 18:37:22 2003
From: python at rcn.com (Raymond Hettinger)
Date: Tue Nov  4 18:37:30 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311042329.hA4NT7727092@12-236-54-216.client.attbi.com>
Message-ID: <001701c3a32c$98d9b980$0aba2c81@oemcomputer>

> >      iseq = irange(start, len(sq))
> >      if godown: iseq = reversed(iseq)
> 
> But this wouldn't work, would it?  irange() is an iterator, but
> reversed() only works for sequences (it refuses to secretly buffer the
> whole thing).

It works fine with xrange though.


Raymond Hettinger


From guido at python.org  Tue Nov  4 18:44:27 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 18:44:34 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 23:04:00 +0100."
	<200311042304.00006.aleaxit@yahoo.com> 
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> 
	<200311042304.00006.aleaxit@yahoo.com> 
Message-ID: <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com>

> > option somehow.  But since (a) at least 60% of the examples are
> > satisfied with something like irevrange(), and (b) having irevrange()
> 
> I'm not sure it's as high as that, depending on how strictly one wants
> to define "satisfied".

There are 6 bullets in PEP 322's "real world use cases" section.  The
first one is not helped by reversed().  Of the remaining 5, three are
simple numeric ranges (heapq.heapify(), platform.dist_try_harder() and
random.shuffle()).  That's exactly 60%. :-)

>     for i, value in reversed(enumerate(listofnum)):

Sorry, this doesn't work.  enumerate() returns an iterator, reversed()
requires a sequence.

> > If you can prove it would be used as frequently as sum() you'd have a
> > point.
> 
> No, not as frequently as sum, but then this applies to many other
> builtins.

Well, they are already there, and we're considering removing some.
I'd like to set the bar for *new* builtins fairly high.  (You all know
the joke how Aspirin would never have been approevd by the FDA as an
over-the-counter drug if it was invented today.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From fincher.8 at osu.edu  Tue Nov  4 19:57:33 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Tue Nov  4 18:59:42 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com>
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042304.00006.aleaxit@yahoo.com>
	<200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com>
Message-ID: <200311041957.33530.fincher.8@osu.edu>

On Tuesday 04 November 2003 06:44 pm, Guido van Rossum wrote:
> >     for i, value in reversed(enumerate(listofnum)):
>
> Sorry, this doesn't work.  enumerate() returns an iterator, reversed()
> requires a sequence.

I believe the assumption is that enumerate (as well as the proposed irange) 
would grow an __reversed__ method to handle just that usage.

Jeremy

From guido at python.org  Tue Nov  4 19:07:40 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 19:07:51 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Tue, 04 Nov 2003 19:57:33 EST."
	<200311041957.33530.fincher.8@osu.edu> 
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042304.00006.aleaxit@yahoo.com>
	<200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> 
	<200311041957.33530.fincher.8@osu.edu> 
Message-ID: <200311050007.hA507eo27246@12-236-54-216.client.attbi.com>

> > >     for i, value in reversed(enumerate(listofnum)):
> >
> > Sorry, this doesn't work.  enumerate() returns an iterator, reversed()
> > requires a sequence.
> 
> I believe the assumption is that enumerate (as well as the proposed irange) 
> would grow an __reversed__ method to handle just that usage.

Ah, so it is.  Then the PEP's abstract is wrong:

"""
This proposal is to add a builtin function to support reverse
iteration over sequences.
"""

Also, the PEP should enumerate (:-) which built-in types should be
modified in this way, to give an impression of the enormity (or not)
of the task.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pedronis at bluewin.ch  Tue Nov  4 19:47:33 2003
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Tue Nov  4 19:44:57 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311050007.hA507eo27246@12-236-54-216.client.attbi.com>
References: <Your message of "Tue,
	04 Nov 2003 19:57:33 EST." <200311041957.33530.fincher.8@osu.edu>
	<002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042304.00006.aleaxit@yahoo.com>
	<200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com>
	<200311041957.33530.fincher.8@osu.edu>
Message-ID: <5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch>

At 16:07 04.11.2003 -0800, Guido van Rossum wrote:
> > > >     for i, value in reversed(enumerate(listofnum)):
> > >
> > > Sorry, this doesn't work.  enumerate() returns an iterator, reversed()
> > > requires a sequence.
> >
> > I believe the assumption is that enumerate (as well as the proposed 
> irange)
> > would grow an __reversed__ method to handle just that usage.
>
>Ah, so it is.  Then the PEP's abstract is wrong:
>
>"""
>This proposal is to add a builtin function to support reverse
>iteration over sequences.
>"""
>
>Also, the PEP should enumerate (:-) which built-in types should be
>modified in this way, to give an impression of the enormity (or not)
>of the task.

what is not clear to me is that the PEP is explicit about reversed() 
refusing general iterables and in particular infinite iterators, but then 
the combination reversed enumerate.__reversed__ would accept them or not?. 
Will enumerate implement __reversed__ in terms of keeping the enumerate 
argument around instead of just a iterator derived from it and reproducing 
then the reversed behavior: limits checks and implementation strategy on 
the original argument if/when __reversed__ is called?

so

for x in reversed(enumerate(itertools.count())):
   pass

would throw an exception instead of not terminating, OTHERWISE with the 
strategy of consuming
the iterator if x is a finite iterator but without __len__ then

reversed(x) would not work but

reversed(enumerate(x)) would.

Further enumerate.__iter__ does not enable re-iteration, simply it does not 
return a fresh iterator but what about enumerate.__reversed__ ?

regards.


From python at rcn.com  Tue Nov  4 19:49:06 2003
From: python at rcn.com (Raymond Hettinger)
Date: Tue Nov  4 19:49:27 2003
Subject: FW: [Python-Dev] PEP 322: Reverse Iteration
Message-ID: <000e01c3a336$9dc068e0$d0ac2c81@oemcomputer>

> > I believe the assumption is that enumerate (as well as the proposed
> irange)
> > would grow an __reversed__ method to handle just that usage.

Unfortunately, that idea didn't work out.  The enumerate object does not
hold the original iterable; instead, it only has the result of
iter(iterable).  Without having the iterable, I don't see a way for it
to call iterable.__reversed__.  The essential problem that at creation
time, the enumerate object does know that it is going to be called by
reversed().

No other sequence object has to have a __reversed__ method.  Like its
cousin, __iter__, some objects may be a performance boost from a custom
iterator but none of them have to have it.


Raymond Hettinger


From tdelaney at avaya.com  Tue Nov  4 20:08:59 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Tue Nov  4 20:09:06 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5E75@au3010avexu1.global.avaya.com>

> From: Alex Martelli [mailto:aleaxit@yahoo.com]
> 
> I must be missing something.  enumerate(x) is an iterator with len(x)
> values, each a pair; why would reversing it somehow "transpose" it...?

No - you're not. Brain fart on my part :(

Tim Delaney

From guido at python.org  Tue Nov  4 20:12:39 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov  4 20:12:48 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 01:47:33 +0100."
	<5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> 
References: <Your message of "Tue,
	04 Nov 2003 19:57:33 EST." <200311041957.33530.fincher.8@osu.edu>
	<002801c3a30c$def8fae0$6017c797@oemcomputer>
	<200311042304.00006.aleaxit@yahoo.com>
	<200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com>
	<200311041957.33530.fincher.8@osu.edu> 
	<5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> 
Message-ID: <200311050112.hA51Cd327356@12-236-54-216.client.attbi.com>

[Guido]
> >Ah, so it is.  Then the PEP's abstract is wrong:
> >
> >"""
> >This proposal is to add a builtin function to support reverse
> >iteration over sequences.
> >"""
> >
> >Also, the PEP should enumerate (:-) which built-in types should be
> >modified in this way, to give an impression of the enormity (or not)
> >of the task.

[Samuele]
> what is not clear to me is that the PEP is explicit about reversed() 
> refusing general iterables and in particular infinite iterators, but then 
> the combination reversed enumerate.__reversed__ would accept them or not?. 
> Will enumerate implement __reversed__ in terms of keeping the enumerate 
> argument around instead of just a iterator derived from it and reproducing 
> then the reversed behavior: limits checks and implementation strategy on 
> the original argument if/when __reversed__ is called?
> 
> so
> 
> for x in reversed(enumerate(itertools.count())):
>    pass
> 
> would throw an exception instead of not terminating, OTHERWISE with
> the strategy of consuming the iterator if x is a finite iterator but
> without __len__ then
> 
> reversed(x) would not work but
> 
> reversed(enumerate(x)) would.
> 
> Further enumerate.__iter__ does not enable re-iteration, simply it
> does not return a fresh iterator but what about
> enumerate.__reversed__ ?

In private mail Raymond withdrew the suggestion that enumerate()
implement __reversed__; I think Raymond won't mind if I quote him here:

[Raymond]
> Unfortunately, that idea didn't work out.  The enumerate object does not
> hold the original iterable; instead, it only has the result of
> iter(iterable).  Without having the iterable, I don't see a way for it
> to call iterable.__reversed__.  The essential problem that at creation
> time, the enumerate object does know that it is going to be called by
> reversed().
> 
> No other sequence object has to have a __reversed__ method.  Like its
> cousin, __iter__, some objects may be a performance boost from a custom
> iterator but none of them have to have it.

So we're back to square one: reversed(enumerate(X)) won't work, even
if reversed(X) works.

I'm not sure I even like the idea of reversed() looking for a
__reversed__ method at all.  I like the original intention best:
reversed() is for reverse iteration over *sequences*.  (See the first
paragraph of the section "Rejected Alternatives" in the PEP.)

Anyway, as Raymond predicted, the discussion is being distracted by
side issues.

I personally like the idea better of having a variant of xrange() that
generates a numerical sequence backwards better.  Or perhaps we should
just get used to recognizing that [x]range(n-1, -1, -1) iterates over
range(n) backwards...

--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at electricrain.com  Tue Nov  4 20:28:51 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Tue Nov  4 20:28:56 2003
Subject: [Python-Dev] simple bsddb interface thread support 2.3.x vs 2.4
In-Reply-To: <1067949852.26825.3.camel@anthem>
References: <200311030848.hA38mItM008890@localhost.localdomain>
	<200311030954.24191.aleaxit@yahoo.com>
	<20031104012310.GC17328@zot.electricrain.com>
	<200311040912.23213.aleaxit@yahoo.com>
	<1067949852.26825.3.camel@anthem>
Message-ID: <20031105012851.GE17328@zot.electricrain.com>

On Tue, Nov 04, 2003 at 07:44:12AM -0500, Barry Warsaw wrote:
> On Tue, 2003-11-04 at 03:12, Alex Martelli wrote:
> 
> > Generally, extending functionality (as opposed to: fixing bugs or clarifying 
> > docs) is not a goal for 2.3.* -- but I don't know if the fact that bsddb 
> > isn't thread-safe in 2.3 counts as "a bug", or rather as functionality 
> > deliberately kept limited, to avoid e.g such bugs as the one you've just 
> > removed, and other possibilities you mention:
> > 
> > >   - multithreaded bsddb use could deadlock depending on how it is used.
> > 
> > I think that just having the 2.3.* docs explicitly mention the lack of 
> > thread-safety might then perhaps be better than backporting the changes.
> 
> It's just the DB-API that's not thread-safe.  The full blown BerkeleyDB
> API (a.k.a. bsddb3) should be fine.
> 
> It sure is tempting to claim that the lack of DB-API thread-safety for
> BerkeleyDB is a bug and should be fixed for 2.3.*, but I think Greg
> should make the final determination.  If it isn't, then yes, the docs
> need to clearly state that's the case.

This was brought up before 2.3.2 was released.  The docs already state
this in a nice and obvious warning:

  http://www.python.org/doc/2.3.2/lib/module-bsddb.html

My vote it to leave bsddb in 2.3.2 as it is and not try to port the
thread support over from 2.4cvs.  It is not ready.

The bsddb module has never supported multithreaded use in any past version
of python.  If the simple bsddb/__init__.py interface can support it
for 2.4 thats great.  It should always be recommended that people use
the full bsddb.db when threads are involved.

If simple bsddb still has non-trivial to describe multithreaded deadlock
issues by the time a 2.4 release draws near I'll suggest pulling it out.
(before then i need to write a test case to prove that it does actually
have these problems)

-g


From greg at electricrain.com  Wed Nov  5 00:51:05 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Wed Nov  5 00:51:09 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/bsddb
	__init__.py, 1.11, 1.12
In-Reply-To: <16294.30567.537151.106168@montanaro.dyndns.org>
References: <E1AGT99-00037N-00@sc8-pr-cvs1.sourceforge.net>
	<16294.30567.537151.106168@montanaro.dyndns.org>
Message-ID: <20031105055105.GG17328@zot.electricrain.com>

On Mon, Nov 03, 2003 at 09:42:31AM -0600, Skip Montanaro wrote:
> 
>     greg>   import UserDict
>     greg>   class _iter_mixin(UserDict.DictMixin):
>     greg>       def __iter__(self):
>     greg>           try:
>     ...
> 
> Should _iter_mixin inherit from dict, or is there a backward compatibility
> issue?

Simply changing UserDict.DictMixin to dict doesn't work.  In order to act
like a dictionary it depends on DictMixin's multi-level implementation
of all of the dict methods using the lower-level primitives.


From aleaxit at yahoo.com  Wed Nov  5 02:41:46 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 02:41:55 2003
Subject: [Python-Dev] simple bsddb interface thread support 2.3.x vs 2.4
In-Reply-To: <20031105012851.GE17328@zot.electricrain.com>
References: <200311030848.hA38mItM008890@localhost.localdomain>
	<1067949852.26825.3.camel@anthem>
	<20031105012851.GE17328@zot.electricrain.com>
Message-ID: <200311050841.46399.aleaxit@yahoo.com>

On Wednesday 05 November 2003 02:28 am, Gregory P. Smith wrote:
   ...
> This was brought up before 2.3.2 was released.  The docs already state
> this in a nice and obvious warning:
>
>   http://www.python.org/doc/2.3.2/lib/module-bsddb.html

You are entirely right: indeed, it's documented with *exemplary* clarity.


> My vote it to leave bsddb in 2.3.2 as it is and not try to port the
> thread support over from 2.4cvs.  It is not ready.

Absolutely.  The fully-documented limitation of 2.3.*'s bsddb interface wrt 
multi-threading should be left alone even if we felt somewhat certain about
a new implementation: enhancing functionality at the risk of introducing bugs
is _not_ what the maintenance branch is about.  Knowing that the new
implementation isn't fully mature just reinforces this.


> The bsddb module has never supported multithreaded use in any past version
> of python.  If the simple bsddb/__init__.py interface can support it
> for 2.4 thats great.  It should always be recommended that people use
> the full bsddb.db when threads are involved.

OK.  This sounds very wise to me.


> If simple bsddb still has non-trivial to describe multithreaded deadlock
> issues by the time a 2.4 release draws near I'll suggest pulling it out.
> (before then i need to write a test case to prove that it does actually
> have these problems)

Again, very advisable!


Alex


From Paul.Moore at atosorigin.com  Wed Nov  5 06:24:50 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Wed Nov  5 06:25:37 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>

From: Guido van Rossum [mailto:guido@python.org]
>> 1. Add a plain irange() to itertools.
>> 2. IF the general reversed() is deemed too theoretical, add EITHER a
>>    reverse argument to irange, or an irevrange to itertools. Both feel
>>    to me a little iffy, but that's my generality bias again.
>> 3. IF the general reversed() is accepted (builtin or not) leave the
>>    irange function in its simple form.

> Hm.  reversed(irange(...)) can't work, so you'd have to have both.

Raymond is proposing (in the PEP) a custom reverse via a __reversed__
special method. I'm assuming that irange() [and enumerate(), and possibly
others] would need such a method, in order to cover just this case.

>From my POV, having reversed(enumerate()) work is essential. Also for
irange() if that is accepted. I've not looked through the other itertools,
but a trawl through those to ensure any that need it have custom reverse
methods would also be sensible.

Paul.

From anthony at interlink.com.au  Wed Nov  5 06:23:16 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Nov  5 06:26:21 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch 
In-Reply-To: <m3oevrer9m.fsf@mira.informatik.hu-berlin.de> 
Message-ID: <200311051123.hA5BNGGc009525@localhost.localdomain>


>>> Martin v. =?iso-8859-15?q?L=F6wis?= wrote
> What system is this on? I find it surprising that the id is < 0: on a
> 32-bit machine, this should only happen if you allocate more than 2GB.

Redhat 10 beta3 (Fedora). I'm not entirely sure why it's generating these.
Using current CVS python (although it also complains when building a 2.3.2
on this platform, but a 2.3.2 compiled on RH9 is fine.

Python 2.3.2+ (#1, Nov  5 2003, 00:54:02) 
[GCC 3.3.1 20030930 (Red Hat Linux 3.3.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> id('sdkjhfdkfhsdkjfhsdkjfhdskf')
-1083363920

It seems most things have a very large id now:

>>> class a: pass
... 
>>> 
>>> print a()
<__main__.a instance at 0xbf6dad4c>
>>> print a()
<__main__.a instance at 0xbf6dac2c>
>>> print a()
<__main__.a instance at 0xbf6dad0c>
>>> print a()
<__main__.a instance at 0xbf6dabcc>

I wonder if it's some sort of "randomly jumble around the address space to
prevent stack-smashing" thing? I seem to recall something about Position
Independant Execution in the release notes. This version of RH will be the 
one released in the next week or so.

> I'd reformulate them as 
> 
> "%x" % (id(o) & 0xffffffffL)
> Of course, you have to replace 0xffffffffL with (unsigned)-1 of the
> system (i.e. 2l*sys.maxint+1). 

Hm. "%x" % (id(o) & 2L*sys.maxint+1)

is considerably less obvious that "%x"%id(o)

> I wonder whether creating a function
> sys.unsigned(id(o))
> would be appropriate, which returns its arguments for positive
> numbers, and PyLong_FromUnsignedLong((unsigned)arg) otherwise.

Possibly. I'm going to have to make the above patch to the 23 branch 
in any case - warnings from the standard test suite are bad. Would a 
different % format code be another option?

Anthony


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Wed Nov  5 06:28:17 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Nov  5 06:31:13 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch 
Message-ID: <200311051128.hA5BSHaG009610@localhost.localdomain>


>>> Anthony Baxter wrote
> Hm. "%x" % (id(o) & 2L*sys.maxint+1)
> is considerably less obvious that "%x"%id(o)

The best I can come up with at this moment using the 'struct' module is
''.join(['%02x'%ord(x) for x in struct.pack('>i', id(o))]), which is also
pretty grotesque. 

Thinking about it further, the better fix might be to replace the test
code that looks for an exact match with a regex-based match instead...

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.

From python at rcn.com  Wed Nov  5 06:52:11 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 06:52:19 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311042247.45386.aleaxit@yahoo.com>
Message-ID: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer>

[Alex]
> I'm slowly coming to accept it -- it's sure way more appropriate as a
> built-in than many that currently crowd the builtins namespace.


[Paul Moore]
> > 3. A general reversed() function seems theoretically useful, but the
> >    concrete use cases seem fairly thin on the ground. I'm broadly in
> >    favour, because I (possibly like Raymond) have a bias for clean,
> >    general solutions. But I can see that "practicality beats purity"
> >    may hold here.

[Alex] 
> Funny, I originally felt queasy (about it being a built-in only) for
> "purity"
> about the overcrowded builtins namespace.  I'm seeing enough use
> cases (even if irange DID grow a wonderful reverse= optional arg...)
> that practicality is gradually winning me over.  I.e., practicality
beats
> purity is what is winning me over, while to you it suggests dampening
> your "broadly in favour"...  we both mention iterating over sequences
> more than over indices, but to me that's a suggestion that reversed
> has a place
 . . .
> You have a point -- Raymond definitely HAS an overall vision on
> iterators &c and he's deserved lots of listening-to even though we
> can't quite see some specific point.


It appears that Alex has been won over to supporting reversed() as a
builtin.  

Among the comp.lang.python crowd, nearly everyone supported some form of
the PEP (with varying preferences on the name or where to put it).  The
community participation rate was high with about 120 posts across four
threads contributing to hammering out the current version of the pep.

Is there anything else that needs to be done in the way of research,
voting, or cheerleading for pep to be accepted?


Raymond


From python at rcn.com  Wed Nov  5 07:29:17 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 07:29:25 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
Message-ID: <001e01c3a398$6e57a520$e841fea9@oemcomputer>

> Raymond is proposing (in the PEP) a custom reverse via a __reversed__
> special method.

That was requested by a number of contributors on comp.lang.python.
It's purpose is to allow user's to add reverse iteration support to
objects that otherwise only offer forward iteration but not sequence
access.

The custom reversed method is not an essential part of the proposal.
It's just a hook for someone who might need it.


> I'm assuming that irange() [and enumerate(), and possibly
> others] would need such a method, in order to cover just this case.

Not really.  When you go to write the code, it becomes clear that it
doesn't apply to enumerate or the other itertools.  The issue is that
the iterator object holds only the result of iter(iterable) and is in no
position to re-probe the underlying iterable to see if it supports
reverse iteration.  The iterator object has no way of knowing in advance
that it is going to be called by reversed().

So, I'm not proposing to add __reversed__ to any existing python
objects.  It may make sense for xrange, but that is an efficiency issue
not an API issue (xrange already works with reversed() without adding a
custom method).


Raymond


From fincher.8 at osu.edu  Wed Nov  5 08:33:37 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Wed Nov  5 07:35:46 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <200311050112.hA51Cd327356@12-236-54-216.client.attbi.com>
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch>
	<200311050112.hA51Cd327356@12-236-54-216.client.attbi.com>
Message-ID: <200311050833.37529.fincher.8@osu.edu>

On Tuesday 04 November 2003 08:12 pm, Guido van Rossum wrote:
> In private mail Raymond withdrew the suggestion that enumerate()
> implement __reversed__; I think Raymond won't mind if I quote him here:
>
> [Raymond]
>
> > Unfortunately, that idea didn't work out.  The enumerate object does not
> > hold the original iterable; instead, it only has the result of
> > iter(iterable).  Without having the iterable, I don't see a way for it
> > to call iterable.__reversed__.  The essential problem that at creation
> > time, the enumerate object does know that it is going to be called by
> > reversed().

I had always assumed that enumerate.__reversed__ would attempt to call a 
reversed iterator on the sequence.  Since enumerate is only sensibly used on 
sequences (which are guaranteed to provide a reverse iterator) it could never 
fail in sensible cases (unless there's some usage of enumerate on 
non-sequences that I'm missing).

> I'm not sure I even like the idea of reversed() looking for a
> __reversed__ method at all.  I like the original intention best:
> reversed() is for reverse iteration over *sequences*.  (See the first
> paragraph of the section "Rejected Alternatives" in the PEP.)

I think the search for the __reversed__ method is the meat of the proposal; I 
can define for myself a simple two-line generator that iterates in reverse 
over sequences.  What I need the language to define for me is a protocol for 
iterating over objects in reverse and for providing users of my own classes 
with the ability to iterate over them in reverse in a standard way.

If this proposal could be satisfied by the simple definition:

def reversed(seq):
    for i in xrange(len(seq)-1, -1, -1):
        yield seq[i]

I wouldn't be for it.  The reason I'm +1 is because I want a standard protocol 
for iterating in reverse over objects.

Jeremy

From python at rcn.com  Wed Nov  5 08:03:31 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 08:03:40 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <20031104211119.GS7212@epoch.metaslash.com>
Message-ID: <002301c3a39d$36d00020$e841fea9@oemcomputer>

[Neal Norwitz]
> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.

+1


Raymond


From mwh at python.net  Wed Nov  5 08:48:53 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov  5 08:48:58 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <002301c3a39d$36d00020$e841fea9@oemcomputer> (Raymond
	Hettinger's message of "Wed, 5 Nov 2003 08:03:31 -0500")
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
Message-ID: <2m7k2f0vui.fsf@starship.python.net>

"Raymond Hettinger" <python@rcn.com> writes:

> [Neal Norwitz]
>> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
>
> +1

I think apply is probably widely enough used that this is too strong.

It could be a right royal pain in the arse if you wanted to have code
that still ran in 1.5.2.  I realize that this poses other problems,
but I don't feel we should be going out of our way to make it harder.

not-a-fan-of-churn-ly y'rs
mwh

-- 
  (Unfortunately, while you get Tom Baker saying "then we 
   were attacked by monsters", he doesn't flash and make 
  "neeeeooww-sploot" noises.)
      -- Gareth Marlow, ucam.chat, from Owen Dunn's review of the year

From Paul.Moore at atosorigin.com  Wed Nov  5 08:50:47 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Wed Nov  5 08:51:33 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060D24@UKDCX001.uk.int.atosorigin.com>

From: Jeremy Fincher [mailto:fincher.8@osu.edu]
> If this proposal could be satisfied by the simple definition:
>
> def reversed(seq):
>     for i in xrange(len(seq)-1, -1, -1):
>         yield seq[i]
>
> I wouldn't be for it.  The reason I'm +1 is because I want
> a standard protocol for iterating in reverse over objects.

The more I think about it, the less I see the need for reversed(). But I'm
having a really difficult time articulating why.

I don't see enough use cases for something which just reverses sequences,
as above. I tend to loop over concrete sequences less and less these days,
using iterators, generators, enumerate, itertools etc, far more. The simple
reversed() above doesn't help at all there. OK, reversed([x]range) is useful,
but as soon as an iterator-based irange existed, I'd use that for "forward"
loops, and be most upset that reversed(irange) didn't work...

Whenever I try to play with writing a reversed() which is more general than
the code above, I get stuck because *something* needs reversing, but it's
virtually never a sequence!

So far, I've needed to reverse:

    itertools.count()
    itertools.zip()
    enumerate()

But this is all fairly incestuous - all I'm proving is that *if* you need
reversed() on something other than a sequence, you can't do it without
help from something (the object itself, or something else). But the cases
*I* care about have been pre-existing Python objects, which Raymond is not
proposing to extend in that way! (I can see that having the __reversed__
protocol may help with user-defined objects, I just don't have such a need
myself).

I'm tending to come down in favour of just having a simple "generate
numbers in reverse" function (whether that is irange(..., reverse=True) or
irevrange, or something else). Like Guido, I think that covers most real
cases. Especially in combination with itertools -

    reversed(seq) <===> imap(seq.__getitem__, irevrange(len(seq)))

Hmm, that reads better with irevrange. Looks like Guido's judgement is
right again...

Actually, itertools.count() looks very much like it's relevant here. It has
a start argument (defaulting to 0) but no stop or step. Maybe we should be
extending this, rather than inventing a new itertool.

    count(start=0, end=<forever>, step=1, reverse=False)

This adds a *lot* of generality to count. Or how about itertools.count() as
above, and itertools.countdown() as a reversed version?

OK. I think I've changed to -0 on PEP 322, and +1 on having irange and
irevrange (or an extended count and countdown) in itertools.

Paul.

From aleaxit at yahoo.com  Wed Nov  5 09:45:11 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 09:45:20 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <2m7k2f0vui.fsf@starship.python.net>
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<2m7k2f0vui.fsf@starship.python.net>
Message-ID: <200311051545.11246.aleaxit@yahoo.com>

On Wednesday 05 November 2003 02:48 pm, Michael Hudson wrote:
> "Raymond Hettinger" <python@rcn.com> writes:
> > [Neal Norwitz]
> >
> >> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
> >
> > +1
>
> I think apply is probably widely enough used that this is too strong.
>
> It could be a right royal pain in the arse if you wanted to have code
> that still ran in 1.5.2.  I realize that this poses other problems,
> but I don't feel we should be going out of our way to make it harder.

Removing _any_ built-in that was around in 1.5.2 will pose similar
problems.  How hard can it be, in Python source that needs to run
on both 1.5.2 and 2.5, to, e.g.:

try: import legacy_25x_152
except ImportError: pass

where the "legacy module" would inject apply (etc) in builtins?  (In
2.4, you'd "just" need to turn off deprecation warnings, which in
such a stretched case as 1.5-to-2.4 you're surely doing anyway...).

Guido has specifically asked for built-ins that could be deprecated.

It doesn't seem to me that asking for deprecation warnings to be
turned off, or a "legacy module" to be conditionally imported, is
"going out of our way to make it harder" to have code running all
the way from 1.5 to 2.5 -- if such a feat currently requires 99 units
of effort it MAY move all the way to 100 this way, but I doubt the
relative augmentation of effort is even as high as that.


Alex


From pje at telecommunity.com  Wed Nov  5 09:54:43 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Nov  5 09:53:40 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <001e01c3a398$6e57a520$e841fea9@oemcomputer>
References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
Message-ID: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>

At 07:29 AM 11/5/03 -0500, Raymond Hettinger wrote:
>Not really.  When you go to write the code, it becomes clear that it
>doesn't apply to enumerate or the other itertools.  The issue is that
>the iterator object holds only the result of iter(iterable) and is in no
>position to re-probe the underlying iterable to see if it supports
>reverse iteration.  The iterator object has no way of knowing in advance
>that it is going to be called by reversed().

Why not change enumerate() to return an iterable, rather than an 
iterator?  Then its __reversed__ method could attempt to delegate to the 
underlying iterable.  Is it likely that anyone relies on enumerate() being 
an iterator, rather than an iterable?


From guido at python.org  Wed Nov  5 09:58:32 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 09:58:48 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch
In-Reply-To: Your message of "Wed, 05 Nov 2003 22:23:16 +1100."
	<200311051123.hA5BNGGc009525@localhost.localdomain> 
References: <200311051123.hA5BNGGc009525@localhost.localdomain> 
Message-ID: <200311051458.hA5EwWc29153@12-236-54-216.client.attbi.com>

> > I'd reformulate them as 
> > 
> > "%x" % (id(o) & 0xffffffffL)
> > Of course, you have to replace 0xffffffffL with (unsigned)-1 of the
> > system (i.e. 2l*sys.maxint+1). 
> 
> Hm. "%x" % (id(o) & 2L*sys.maxint+1)
> 
> is considerably less obvious that "%x"%id(o)
> 
> > I wonder whether creating a function
> > sys.unsigned(id(o))
> > would be appropriate, which returns its arguments for positive
> > numbers, and PyLong_FromUnsignedLong((unsigned)arg) otherwise.
> 
> Possibly. I'm going to have to make the above patch to the 23 branch 
> in any case - warnings from the standard test suite are bad. Would a 
> different % format code be another option?

This warning will go away in 2.4 again, where %x with a negative int
will return a hex number with a minus sign.  So I'd be against
introducing a new format code.  I've forgotten in what code you found
this, but the sys.maxint solution sounds like your best bet.  In 2.4
we can also make id() return a long when the int value would be
negative; I don't want to do that in 2.3 since changing the return
type and value of a builtin in a minor release seems a compatibility
liability -- but in 2.4 the difference between int and long will be
wiped out even more than it already is, so it should be fine there.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From mwh at python.net  Wed Nov  5 10:02:07 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov  5 10:02:12 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <200311051545.11246.aleaxit@yahoo.com> (Alex Martelli's message
	of "Wed, 5 Nov 2003 15:45:11 +0100")
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<2m7k2f0vui.fsf@starship.python.net>
	<200311051545.11246.aleaxit@yahoo.com>
Message-ID: <2mvfpyzwnk.fsf@starship.python.net>

Alex Martelli <aleaxit@yahoo.com> writes:

> On Wednesday 05 November 2003 02:48 pm, Michael Hudson wrote:
>> "Raymond Hettinger" <python@rcn.com> writes:
>> > [Neal Norwitz]
>> >
>> >> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
>> >
>> > +1
>>
>> I think apply is probably widely enough used that this is too strong.
>>
>> It could be a right royal pain in the arse if you wanted to have code
>> that still ran in 1.5.2.  I realize that this poses other problems,
>> but I don't feel we should be going out of our way to make it harder.
>
> Removing _any_ built-in that was around in 1.5.2 will pose similar
> problems.

Well, yeah, but I contend doing it to, say, coerce would cause less
grief than apply.

> How hard can it be, in Python source that needs to run on both 1.5.2
> and 2.5, to, e.g.:
>
> try: import legacy_25x_152
> except ImportError: pass
>
> where the "legacy module" would inject apply (etc) in builtins?  (In
> 2.4, you'd "just" need to turn off deprecation warnings, which in
> such a stretched case as 1.5-to-2.4 you're surely doing anyway...).

Yeah, I guess for apply that is no great stretch.

> Guido has specifically asked for built-ins that could be deprecated.

I know, but maybe I think he shouldn't have :-)

-----

There's always going to be a tension between wanting to keep backwards
compatibility and making the Python of tomorrow as perfect as
possible.  To me, leaving the builtins a little it cluttered just
isn't that painful.

And perhaps talking about people trying to keep code running on 1.5.2
and 2.4 wasn't a good example; I have more sympathy for people who are
trying to upgrade the Python they use.  Each little obstacle means
that they are that little bit more likely to just throw their hands up
in the air and keep on using 1.5.2 or 2.1 -- and that would be a Bad
Thing.

Cheers,
mwh

-- 
  That one is easily explained away as massively intricate
  conspiracy, though.            -- Chris Klein, alt.sysadmin.recovery

From guido at python.org  Wed Nov  5 10:02:55 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 10:03:12 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 06:52:11 EST."
	<001c01c3a393$3f87f2e0$e841fea9@oemcomputer> 
References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> 
Message-ID: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>

> Among the comp.lang.python crowd, nearly everyone supported some form of
> the PEP (with varying preferences on the name or where to put it).  The
> community participation rate was high with about 120 posts across four
> threads contributing to hammering out the current version of the pep.

How many participants in those 120 posts?  (I recall a thread where
one individual posted 100 messages. :-)

> Is there anything else that needs to be done in the way of research,
> voting, or cheerleading for pep to be accepted?

Yes.  I'm getting cold feet about __reversed__.  Some folks seem to
think that reversed() can be made to work on many iterators by having
the iterator supply __reversed__; I think this is asking for trouble
(e.g. you already pointed out why it couldn't be done for
enumerate()).

I also still think that a reversed [x]range() would give us a bigger
bang for the buck -- less bang, but also a lot less bucks. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pedronis at bluewin.ch  Wed Nov  5 10:06:18 2003
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Wed Nov  5 10:03:41 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060D24@UKDCX001.uk.int.a
	tosorigin.com>
Message-ID: <5.2.1.1.0.20031105153551.028d4300@pop.bluewin.ch>

At 13:50 05.11.2003 +0000, Moore, Paul wrote:
>From: Jeremy Fincher [mailto:fincher.8@osu.edu]
> > If this proposal could be satisfied by the simple definition:
> >
> > def reversed(seq):
> >     for i in xrange(len(seq)-1, -1, -1):
> >         yield seq[i]
> >
> > I wouldn't be for it.  The reason I'm +1 is because I want
> > a standard protocol for iterating in reverse over objects.
>
>The more I think about it, the less I see the need for reversed(). But I'm
>having a really difficult time articulating why.
>
>I don't see enough use cases for something which just reverses sequences,
>as above. I tend to loop over concrete sequences less and less these days,
>using iterators, generators, enumerate, itertools etc, far more. The simple
>reversed() above doesn't help at all there. OK, reversed([x]range) is useful,
>but as soon as an iterator-based irange existed, I'd use that for "forward"
>loops, and be most upset that reversed(irange) didn't work...
>
>Whenever I try to play with writing a reversed() which is more general than
>the code above, I get stuck because *something* needs reversing, but it's
>virtually never a sequence!
>
>So far, I've needed to reverse:
>
>     itertools.count()
>     itertools.zip()
>     enumerate()
>
>But this is all fairly incestuous - all I'm proving is that *if* you need
>reversed() on something other than a sequence, you can't do it without
>help from something (the object itself, or something else). But the cases
>*I* care about have been pre-existing Python objects, which Raymond is not
>proposing to extend in that way! (I can see that having the __reversed__
>protocol may help with user-defined objects, I just don't have such a need
>myself).

1) the problem is that reversed want to be simple and sweet, but general 
reverse iteration is not that simple.

2) itertools.count/ izip and enumerate produce iterators forgetting the 
original iterable so while nice

reversed(count(9))
reversed(enumerate([1,2,3]))

would require rather not straightforward mechanisms under the hood.

Either one write and introduce revenumerate , revcount revizip

OR one could make reversed also a functional allowing not only for

reversed(it) # it implements __reversed__ or it's a sequence

but also

reversed(count,9)
reversed(enumerate,[1,2,3])
reversed(izip,[1,2],[1,3])

[ the implementation would use some table to register the impl of all those 
behaviors], with possible behaviors:

def rev_count(n):
   while True:
     yield n
     n -= 1

def rev_izip(*iterables):
   iterables = map(reversed, iterables)
   while True:
     result = [i.next() for i in iterables]
     yield tuple(result)

def rev_enumerate(it):
     if hasattr(it, '__reversed__'):
         index = -1 # arbitrary but not totally meaningless :)
         for elem x.__reversed__():
             yield (index,x)
             index -= -1
     if hasattr(x, 'keys'):
         raise ValueError("mappings do not support reverse iteration")
     i = len(x)
     while i > 0:
         i -= 1
         yield (i,x[i])

rev_behavior = { enumerate: rev_enumerate, ... }

def reversed(*args):
   if len(args)>1:
     func = args[1]
     args = args[1:]
     rev_func = rev_behavior.get(func,None)
     if rev_func:
       for x in rev_func(args):
         yield x
     else:
       ... error
   else:
     ...

Whether this is for general consumption is another matter.

regards.


From guido at python.org  Wed Nov  5 10:06:09 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 10:06:44 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: Your message of "Wed, 05 Nov 2003 08:03:31 EST."
	<002301c3a39d$36d00020$e841fea9@oemcomputer> 
References: <002301c3a39d$36d00020$e841fea9@oemcomputer> 
Message-ID: <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com>

> [Neal Norwitz]
> > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
> 
> +1

Isn't apply() already deprecated?  Otherwise +1.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From FBatista at uniFON.com.ar  Wed Nov  5 10:09:23 2003
From: FBatista at uniFON.com.ar (Batista, Facundo)
Date: Wed Nov  5 10:10:22 2003
Subject: [Python-Dev] Deprecating obsolete builtins
Message-ID: <A128D751272CD411BC9200508BC2194D0338318C@escpl.tcp.com.ar>

#- > [Neal Norwitz]
#- > > For 2.4 I'd suggest we officially deprecate: apply, 
#- coerce, intern.
#- > 

+1

.	Facundo

From pedronis at bluewin.ch  Wed Nov  5 10:13:49 2003
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Wed Nov  5 10:11:09 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>
References: <Your message of "Wed, 05 Nov 2003 06:52:11 EST."
	<001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
	<001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
Message-ID: <5.2.1.1.0.20031105161029.028dd278@pop.bluewin.ch>

At 07:02 05.11.2003 -0800, Guido van Rossum wrote:
> > Among the comp.lang.python crowd, nearly everyone supported some form of
> > the PEP (with varying preferences on the name or where to put it).  The
> > community participation rate was high with about 120 posts across four
> > threads contributing to hammering out the current version of the pep.
>
>How many participants in those 120 posts?  (I recall a thread where
>one individual posted 100 messages. :-)
>
> > Is there anything else that needs to be done in the way of research,
> > voting, or cheerleading for pep to be accepted?
>
>Yes.  I'm getting cold feet about __reversed__.  Some folks seem to
>think that reversed() can be made to work on many iterators by having
>the iterator supply __reversed__; I think this is asking for trouble
>(e.g. you already pointed out why it couldn't be done for
>enumerate()).

yes, but __reversed__ is meanigful for iterables not iterators

I had the impression that reversed(.) is related to iter(.) for reverse 
iteration
and __reversed__ would correspond to __iter__ also for that, but this is 
meanigful for iterables that are not already iterators.

For iterators __iter__ is typically the identity, while __reversed__ is not 
really applicable which probably means that reverse iteration is more 
complicated that forward iteration <wink>.

regards.


From guido at python.org  Wed Nov  5 10:23:09 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 10:23:18 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: Your message of "Wed, 05 Nov 2003 15:45:11 +0100."
	<200311051545.11246.aleaxit@yahoo.com> 
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<2m7k2f0vui.fsf@starship.python.net> 
	<200311051545.11246.aleaxit@yahoo.com> 
Message-ID: <200311051523.hA5FN9r29272@12-236-54-216.client.attbi.com>

> Removing _any_ built-in that was around in 1.5.2 will pose similar
> problems.

Only proportional to the likelihood that it was used in 1.5.2, which
is proportional to how useful it is.  intern(): extremely unlikely
(nobody knows what it's for); coerce(): rather unlikely (too
advanced); apply(): very likely.

> How hard can it be, in Python source that needs to run
> on both 1.5.2 and 2.5, to, e.g.:
> 
> try: import legacy_25x_152
> except ImportError: pass
> 
> where the "legacy module" would inject apply (etc) in builtins?  (In
> 2.4, you'd "just" need to turn off deprecation warnings, which in
> such a stretched case as 1.5-to-2.4 you're surely doing anyway...).

The problem (and real cost, for some!) is that people who write code
that should work for 1.5.2 and later end up having to do more
maintenance on it for each new Python version they support.  Maybe we
should just be resigned to having a bunch of unwanted builtins until
3.0 comes along (where I'm okay with all bets being off).

> Guido has specifically asked for built-ins that could be deprecated.
> 
> It doesn't seem to me that asking for deprecation warnings to be
> turned off, or a "legacy module" to be conditionally imported, is
> "going out of our way to make it harder" to have code running all
> the way from 1.5 to 2.5 -- if such a feat currently requires 99 units
> of effort it MAY move all the way to 100 this way, but I doubt the
> relative augmentation of effort is even as high as that.

(a) It's always better to be able to use a common subset than to have
    to resort to version checking or version-specific hacks.  (We've
    all learned this in the context of platform independence; I think
    the same applies to version independence.)

(b) Since 2.4 and 2.5 don't yet exist (2.4 is at best a moving
    target), someone wanting to use a cross-version subset *now* has
    to settle for targeting and testing with 1.5.2 through 2.3.
    Forcing these folks to do a new release for 2.4 or 2.5 is not
    increasing their work from 99 to 100 units, it's increasing the
    work they have to do in the future from 0 to 1 (on an arbitrary
    scale :-).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Nov  5 10:28:01 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 10:28:07 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 09:54:43 EST."
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> 
References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> 
Message-ID: <200311051528.hA5FS1l29291@12-236-54-216.client.attbi.com>

> Why not change enumerate() to return an iterable, rather than an
> iterator?  Then its __reversed__ method could attempt to delegate to
> the underlying iterable.  Is it likely that anyone relies on
> enumerate() being an iterator, rather than an iterable?

I find it rather elegant to use enumerate() on a file to generate line
numbers and lines together (adding 1 to the index to produce a more
conventional line number).  What's more elegant than

  for i, line in enumerate(f):
      print i+1, line,

to print a file with line numbers???  I've used this in throwaway
code at least, and would hate to lose it.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Nov  5 10:33:24 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 10:33:32 2003
Subject: [Python-Dev] PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 08:33:37 EST."
	<200311050833.37529.fincher.8@osu.edu> 
References: <002801c3a30c$def8fae0$6017c797@oemcomputer>
	<5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch>
	<200311050112.hA51Cd327356@12-236-54-216.client.attbi.com> 
	<200311050833.37529.fincher.8@osu.edu> 
Message-ID: <200311051533.hA5FXPX29334@12-236-54-216.client.attbi.com>

> I think the search for the __reversed__ method is the meat of the
> proposal; I can define for myself a simple two-line generator that
> iterates in reverse over sequences.  What I need the language to
> define for me is a protocol for iterating over objects in reverse
> and for providing users of my own classes with the ability to
> iterate over them in reverse in a standard way.
> 
> If this proposal could be satisfied by the simple definition:
> 
> def reversed(seq):
>     for i in xrange(len(seq)-1, -1, -1):
>         yield seq[i]
> 
> I wouldn't be for it.  The reason I'm +1 is because I want a
> standard protocol for iterating in reverse over objects.

I would be *against* such a protocol.  It would end up complicating
almost everything that defines __iter__, for a very questionable
pay-off (reverse iteration isn't that common except for some special
cases).

The PEP got as far as it is by focusing on simplicity and sequences.
It is rapidly losing its innocence. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Wed Nov  5 10:55:10 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 10:55:26 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>
Message-ID: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer>

> > Among the comp.lang.python crowd, nearly everyone supported some
form of
> > the PEP (with varying preferences on the name or where to put it).
The
> > community participation rate was high with about 120 posts across
four
> > threads contributing to hammering out the current version of the
pep.
> 
> How many participants in those 120 posts?  (I recall a thread where
> one individual posted 100 messages. :-)

There were 31 participants.

{'Mel Wilson': 1, 'Tom Anderson': 1, 'Dave Benjamin': 2, 'Stephen
Horne': 14, 'David Abrahams': 12, 'David Mertz': 1, 'Ron Adam': 1,
'Terry Reedy': 4, 'Sean Ross': 6, 'Bengt Richter': 1, 'Andrew Dalke': 3,
'Michele Simionato': 2, 'Bernhard Herzog': 1, 'Raymond Hettinger': 15,
'David C': 1, 'Paul Moore': 3, 'Dang Griffith': 1, 'Roy Smith': 1, 'Alex
Martelli': 14, 'Patrick Maupin': 1, 'Jeremy Fincher': 3, 'Steve Holden':
1, 'Robert Brewer': 1, 'Chad Netzer': 1, 'Werner Schiendl': 6, 'Peter
Otten': 3, 'David Eppstein': 3, 'Fredrik Lundh': 1, 'Lulu of': 2,
'Michael Hudson': 1, 'John Roth': 4}


> > Is there anything else that needs to be done in the way of research,
> > voting, or cheerleading for pep to be accepted?
> 
> Yes.  I'm getting cold feet about __reversed__.

What can I do to warm those feet?

I spent a month making this proposal as perfect as possible, gathering
support for it, trying each proposed modification, and enduring what
feels like hazing.  Still, there is a little bit of energy left if that
what it takes to put the ball over the goal line.

Getting this far hasn't been easy.  Python people are quick to express
negativity on just about anything and they take great pleasure is
exploring every weird variant they can think of.  


> I also still think that a reversed [x]range() would give us a bigger
> bang for the buck 

I'm not willing to go that route:
* Several posters gave negative feedback on that option.
* It doesn't address the ugly and inefficient s[::-1] approach which I
really do not want to become *the* idiom.
* Providing yet another variant of xrange() is a step backwards IMO.
* It is not an extensible protocol like the reversed() / __reversed__
pair.
* Except for the simple case of revrange(n), the multiple argument forms
are not a simplification (IMO) and are still difficult to visually
verify (try the example from random.shuffle).
* A unique benefit to python is the ability to loop over containers
without using indices.  The current proposal supports that idea.  The
revrange() approach doesn't.


Raymond Hettinger


From aleaxit at yahoo.com  Wed Nov  5 10:56:21 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 10:56:30 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>
References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
	<200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>
Message-ID: <200311051656.21014.aleaxit@yahoo.com>

On Wednesday 05 November 2003 04:02 pm, Guido van Rossum wrote:
> > Among the comp.lang.python crowd, nearly everyone supported some form of
> > the PEP (with varying preferences on the name or where to put it).  The
> > community participation rate was high with about 120 posts across four
> > threads contributing to hammering out the current version of the pep.
>
> How many participants in those 120 posts?  (I recall a thread where
> one individual posted 100 messages. :-)

I count 25 separate contributors to threads about PEP 322 (but I only see 75 
posts there, and three threads, so I must be missing some of those that 
Raymond is counting -- or perhaps, not unlikely, they've expired off my 
newsserver).


> > Is there anything else that needs to be done in the way of research,
> > voting, or cheerleading for pep to be accepted?
>
> Yes.  I'm getting cold feet about __reversed__.  Some folks seem to
> think that reversed() can be made to work on many iterators by having
> the iterator supply __reversed__; I think this is asking for trouble
> (e.g. you already pointed out why it couldn't be done for
> enumerate()).

I still think it could be, if enumerate kept a reference to its argument, but
that's a detail -- I trust your instinct about such design issues (or I 
wouldn't be using Python...:-).  So: let's keep it simple and have reversed
be _exactly_ equivalent to (net of performance, hypothetical anomalous 
"pseudosequences" doing weird things, & exact error kinds/msgs):

def reversed(sequence):
    for x in xrange(len(sequence)-1, -1, -1): yield sequence[x]

no __reversed__, no complications, "no nuttin'".

Putting that in the current 2.4 pre-alpha will let us start getting some
experience with it and see if in the future we want to add refinements
(always easier to add than to remove...:-) -- either to reverse or to
other iterator-returning calls (e.g. reverse= optional arguments just
like in the sort method of lists).


Alex


From aleaxit at yahoo.com  Wed Nov  5 11:09:14 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 11:09:26 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
Message-ID: <200311051709.14373.aleaxit@yahoo.com>

On Wednesday 05 November 2003 03:54 pm, Phillip J. Eby wrote:
   ...
> >reverse iteration.  The iterator object has no way of knowing in advance
> >that it is going to be called by reversed().
>
> Why not change enumerate() to return an iterable, rather than an
> iterator?  Then its __reversed__ method could attempt to delegate to the
> underlying iterable.  Is it likely that anyone relies on enumerate() being
> an iterator, rather than an iterable?

I do rely on the _argument_ of enumerate being allowed to be just any 
iterator, yes -- e.g. in such idioms as:

    for i, x in enumerate(xs):
        if isgoodenough(x): return x
        elif istoohigh(i): raise GettingBoredError, i

Yes, I _could_ recode that as:

    i = 0
    for x in xs:
        if isgoodenough(x): return x
        i += 1
        if istoohigh(i): raise GettingBoredError, i

but, I don't _wanna_...:-).  enumerate is just too slick!

Of course, it would be fine for reverse(enumerate(x)) to fail for unsuitable
values of x -- that's a separate issue.

But actually it would not be a tragedy if I couldn't reverse(enumerate --
e.g. where I'd LIKE to code:

    for i, x in reverse(enumerate(xs)):
        if isbad(x): raise BadXError, x
        xs[i] = transform(x)

I _might_ reasonably code:

    for i, x in enumerate(reverse(xs)):
        if isbad(x): raise BadXError, x
        xs[-1-i] = transform(x)

that -1-i may not be the prettiest sight in the world, but I think this STILL
beats the alternative of:

    for i in reversed_range(len(xs)):
        x = xs[i]
        if isbad(x): raise BadXError, x
        xs[i] = transform(x)

not to mention today's

    for i in xrange(-1, -len(xs)-1, -1):
        x = xs[i]
        if isbad(x): raise BadXError, x
        xs[i] = transform(x)

or:

    for i in xrange(len(xs)-1, -1, -1):
        x = xs[i]
        if isbad(x): raise BadXError, x
        xs[i] = transform(x)


Alex


From pedronis at bluewin.ch  Wed Nov  5 11:34:29 2003
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Wed Nov  5 11:33:42 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051709.14373.aleaxit@yahoo.com>
References: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
	<16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
Message-ID: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>

At 17:09 05.11.2003 +0100, Alex Martelli wrote:
>On Wednesday 05 November 2003 03:54 pm, Phillip J. Eby wrote:
>    ...
> > >reverse iteration.  The iterator object has no way of knowing in advance
> > >that it is going to be called by reversed().
> >
> > Why not change enumerate() to return an iterable, rather than an
> > iterator?  Then its __reversed__ method could attempt to delegate to the
> > underlying iterable.  Is it likely that anyone relies on enumerate() being
> > an iterator, rather than an iterable?

I think he was wondering whether people rely on


enumerate([1,2]).next
i = enumerate([1,2])
i is iter(i)

working , vs. needing iter(enumerate([1,2]).next

I think he was proposing to implement enumerate as

class enumerate(object):
   def __init__(self,iterable):
     self.iterable = iterable

   def __iter__(self):
     i = 0
     for x in self.iterable:
       yield i,x
       i += 1

   def __reversed__(self):
     rev = reversed(self.iterable)
     try:
       i = len(self.iterable)-1
     except (TypeError,AttributeError):
       i = -1
     for x in rev:
        yield i,x
        i -= 1


From marktrussell at btopenworld.com  Wed Nov  5 11:48:31 2003
From: marktrussell at btopenworld.com (Mark Russell)
Date: Wed Nov  5 11:48:23 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051656.21014.aleaxit@yahoo.com>
References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
	<200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>
	<200311051656.21014.aleaxit@yahoo.com>
Message-ID: <1068050911.954.8.camel@localhost>

On Wed, 2003-11-05 at 15:56, Alex Martelli wrote:
> def reversed(sequence):
>     for x in xrange(len(sequence)-1, -1, -1): yield sequence[x]
> 
> no __reversed__, no complications, "no nuttin'".

If I was adding this as a library routine, I'd do:

def reversed(sequence):
    try:
        seqlen = len(sequence)
    except TypeError:
        sequence = list(sequence)
        seqlen = len(sequence)

    for x in xrange(seqlen-1, -1, -1):
        yield sequence[x]

OK, inefficient for iterators on long sequences, but it works with
enumerate() etc and needs no changes to existing types.

Mark

From pje at telecommunity.com  Wed Nov  5 12:02:09 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Nov  5 12:02:35 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051528.hA5FS1l29291@12-236-54-216.client.attbi.com>
References: <Your message of "Wed, 05 Nov 2003 09:54:43 EST."
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
	<16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20031105115935.03299bc0@telecommunity.com>

At 07:28 AM 11/5/03 -0800, Guido van Rossum wrote:
> > Why not change enumerate() to return an iterable, rather than an
> > iterator?  Then its __reversed__ method could attempt to delegate to
> > the underlying iterable.  Is it likely that anyone relies on
> > enumerate() being an iterator, rather than an iterable?
>
>I find it rather elegant to use enumerate() on a file to generate line
>numbers and lines together (adding 1 to the index to produce a more
>conventional line number).  What's more elegant than
>
>   for i, line in enumerate(f):
>       print i+1, line,
>
>to print a file with line numbers???  I've used this in throwaway
>code at least, and would hate to lose it.

I thought 'for x in y' always called 'iter(y)', in which case the above 
still works.  It's only this:

ef = enumerate(f)

while 1:
     try:
         i,line = ef.next()
         print i+1, line,
     except StopIteration:
         break

That would break.


From pje at telecommunity.com  Wed Nov  5 12:06:13 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Nov  5 12:06:34 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
References: <200311051709.14373.aleaxit@yahoo.com>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
	<16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20031105120524.03298560@telecommunity.com>

At 05:34 PM 11/5/03 +0100, Samuele Pedroni wrote:
>At 17:09 05.11.2003 +0100, Alex Martelli wrote:
>>On Wednesday 05 November 2003 03:54 pm, Phillip J. Eby wrote:
>>    ...
>> > >reverse iteration.  The iterator object has no way of knowing in advance
>> > >that it is going to be called by reversed().
>> >
>> > Why not change enumerate() to return an iterable, rather than an
>> > iterator?  Then its __reversed__ method could attempt to delegate to the
>> > underlying iterable.  Is it likely that anyone relies on enumerate() being
>> > an iterator, rather than an iterable?
>
>I think he was wondering whether people rely on
>
>
>enumerate([1,2]).next
>i = enumerate([1,2])
>i is iter(i)
>
>working , vs. needing iter(enumerate([1,2]).next

Yes, precisely.


>I think he was proposing to implement enumerate as
>
>class enumerate(object):
>   def __init__(self,iterable):
>     self.iterable = iterable
>
>   def __iter__(self):
>     i = 0
>     for x in self.iterable:
>       yield i,x
>       i += 1
>
>   def __reversed__(self):
>     rev = reversed(self.iterable)
>     try:
>       i = len(self.iterable)-1
>     except (TypeError,AttributeError):
>       i = -1
>     for x in rev:
>        yield i,x
>        i -= 1

Yes, except I hadn't thought it out in quite that much detail.  Thanks for 
the clarification.


From aleaxit at yahoo.com  Wed Nov  5 13:14:52 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 13:15:05 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
References: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
	<5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
Message-ID: <200311051914.52326.aleaxit@yahoo.com>

On Wednesday 05 November 2003 05:34 pm, Samuele Pedroni wrote:
   ...
> I think he was wondering whether people rely on
>
> enumerate([1,2]).next
> i = enumerate([1,2])
> i is iter(i)
>
> working , vs. needing iter(enumerate([1,2]).next
>
> I think he was proposing to implement enumerate as
>
> class enumerate(object):
>    def __init__(self,iterable):
>      self.iterable = iterable
>
>    def __iter__(self):
>      i = 0
>      for x in self.iterable:
>        yield i,x
>        i += 1
>
>    def __reversed__(self):
>      rev = reversed(self.iterable)
>      try:
>        i = len(self.iterable)-1
>      except (TypeError,AttributeError):
>        i = -1
>      for x in rev:
>         yield i,x
>         i -= 1

Ah, I see -- thanks!  Well, in theory you COULD add a 'next' method too:

      def next(self):
          self.iterable = iter(self.iterable)
          try: self.index += 1
          except AttributeError: self.index = 0
          return self.index, self.iterable.next()

(or some reasonable optimization thereof:-) -- now __reversed__ would stop
working after any .next call, but that would still be OK for all use cases I 
can think of.


Alex


From guido at python.org  Wed Nov  5 13:33:56 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 13:34:05 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 16:56:21 +0100."
	<200311051656.21014.aleaxit@yahoo.com> 
References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
	<200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> 
	<200311051656.21014.aleaxit@yahoo.com> 
Message-ID: <200311051833.hA5IXuN29576@12-236-54-216.client.attbi.com>

> So: let's keep it simple and have reversed
> be _exactly_ equivalent to (net of performance, hypothetical anomalous 
> "pseudosequences" doing weird things, & exact error kinds/msgs):
> 
> def reversed(sequence):
>     for x in xrange(len(sequence)-1, -1, -1): yield sequence[x]
> 
> no __reversed__, no complications, "no nuttin'".
> 
> Putting that in the current 2.4 pre-alpha will let us start getting some
> experience with it and see if in the future we want to add refinements
> (always easier to add than to remove...:-) -- either to reverse or to
> other iterator-returning calls (e.g. reverse= optional arguments just
> like in the sort method of lists).

I'd be for that, *if* we also allow as a possible outcome that
reversed() simply doesn't find any use and we take it out before
releasing 2.4b1.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Nov  5 13:43:16 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 13:44:07 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 10:55:10 EST."
	<003301c3a3b5$3161b1c0$e841fea9@oemcomputer> 
References: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> 
Message-ID: <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com>

> > Yes.  I'm getting cold feet about __reversed__.
> 
> What can I do to warm those feet?
> 
> I spent a month making this proposal as perfect as possible, gathering
> support for it, trying each proposed modification, and enduring what
> feels like hazing.  Still, there is a little bit of energy left if that
> what it takes to put the ball over the goal line.
> 
> Getting this far hasn't been easy.  Python people are quick to express
> negativity on just about anything and they take great pleasure is
> exploring every weird variant they can think of.  

I'm okay with adding reversed() as a builtin that works for sequences
only but I'm not okay with adding the __reversed__ protocol.

For me, the main advantage of reversed() is that it expresses better
what I mean when I'm going over a list (or other concrete sequence)
backwards.  The __reversed__ protocol muddles the issue by inviting to
try to make reversed() work for some iterators; I don't see the use
case (or if I do see it, I see it as much less important than the
previous one).

> > I also still think that a reversed [x]range() would give us a bigger
> > bang for the buck 
> 
> I'm not willing to go that route:
> * Several posters gave negative feedback on that option.
> * It doesn't address the ugly and inefficient s[::-1] approach which I
> really do not want to become *the* idiom.
> * Providing yet another variant of xrange() is a step backwards IMO.
> * It is not an extensible protocol like the reversed() / __reversed__
> pair.
> * Except for the simple case of revrange(n), the multiple argument forms
> are not a simplification (IMO) and are still difficult to visually
> verify (try the example from random.shuffle).
> * A unique benefit to python is the ability to loop over containers
> without using indices.  The current proposal supports that idea.  The
> revrange() approach doesn't.

Points well taken.  About your last bullet, I wonder if one of the
issues is that when doing a forward loop over a container, we don't
really care that much about the order as long as we get all items
(witness the popularity of looping over dicts).  But when doing a
reverse loop, we clearly *do* care about the order.  So forward and
reverse iteration are not symmetric.  This may explains why 3 out of 5
examples you found *need* the index.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Nov  5 14:43:43 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Wed Nov  5 14:44:20 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch
In-Reply-To: <200311051128.hA5BSHaG009610@localhost.localdomain>
References: <200311051128.hA5BSHaG009610@localhost.localdomain>
Message-ID: <m3wuaeiosw.fsf@mira.informatik.hu-berlin.de>

Anthony Baxter <anthony@interlink.com.au> writes:

> >>> Anthony Baxter wrote
> > Hm. "%x" % (id(o) & 2L*sys.maxint+1)
> > is considerably less obvious that "%x"%id(o)
> 
> The best I can come up with at this moment using the 'struct' module is
> ''.join(['%02x'%ord(x) for x in struct.pack('>i', id(o))]), which is also
> pretty grotesque. 

In what sense is this better - in particular if you would write mine as

MAX_UINT = 2L*sys.maxint+1
...
"%x" % (id(o) & MAX_UINT)

> Thinking about it further, the better fix might be to replace the test
> code that looks for an exact match with a regex-based match instead...

It's not just in test code, AFAIR - also in minidom __repr__ (or some
such).

Regards,
Martin


From martin at v.loewis.de  Wed Nov  5 14:47:04 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Wed Nov  5 14:47:26 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com>
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<200311051506.hA5F69u29213@12-236-54-216.client.attbi.com>
Message-ID: <m3sml2ionb.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> > [Neal Norwitz]
> > > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
> > 
> > +1
> 
> Isn't apply() already deprecated?  Otherwise +1.

Not with a deprecation warning.

Regards,
Martin

From fdrake at acm.org  Wed Nov  5 14:50:38 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed Nov  5 14:50:48 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <m3sml2ionb.fsf@mira.informatik.hu-berlin.de>
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<200311051506.hA5F69u29213@12-236-54-216.client.attbi.com>
	<m3sml2ionb.fsf@mira.informatik.hu-berlin.de>
Message-ID: <16297.21646.645041.827176@grendel.zope.com>


Martin v. L?wis writes:
 > Not with a deprecation warning.

But it does generate a PendingDeprecationWarning.  Given the long
history of apply(), that's about as strong a change as can be made
just now, and much stronger than some would like.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From guido at python.org  Wed Nov  5 14:57:02 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 14:57:09 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: Your message of "05 Nov 2003 20:47:04 +0100."
	<m3sml2ionb.fsf@mira.informatik.hu-berlin.de> 
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<200311051506.hA5F69u29213@12-236-54-216.client.attbi.com> 
	<m3sml2ionb.fsf@mira.informatik.hu-berlin.de> 
Message-ID: <200311051957.hA5Jv2B29760@12-236-54-216.client.attbi.com>

> > Isn't apply() already deprecated?  Otherwise +1.
> 
> Not with a deprecation warning.

Ah, it's coming back.  It's a silent deprecation, because there are
too many uses still.  Probably the same will hold for another release
or two.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Wed Nov  5 15:34:55 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 15:35:05 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051833.hA5IXuN29576@12-236-54-216.client.attbi.com>
References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
	<200311051656.21014.aleaxit@yahoo.com>
	<200311051833.hA5IXuN29576@12-236-54-216.client.attbi.com>
Message-ID: <200311052134.56018.aleaxit@yahoo.com>

On Wednesday 05 November 2003 19:33, Guido van Rossum wrote:
> > So: let's keep it simple and have reversed
> > be _exactly_ equivalent to (net of performance, hypothetical anomalous
   ...
> I'd be for that, *if* we also allow as a possible outcome that
> reversed() simply doesn't find any use and we take it out before
> releasing 2.4b1.

Sure, why not?  Determining the exact set of features is what pre-beta
releases are for, in a sense.


Alex


From python at rcn.com  Wed Nov  5 15:54:22 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 15:54:39 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com>
Message-ID: <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer>

[GvR]
> I'm okay with adding reversed() as a builtin that works for sequences
> only but I'm not okay with adding the __reversed__ protocol.
> 
> For me, the main advantage of reversed() is that it expresses better
> what I mean when I'm going over a list (or other concrete sequence)
> backwards.  The __reversed__ protocol muddles the issue by inviting to
> try to make reversed() work for some iterators; I don't see the use
> case (or if I do see it, I see it as much less important than the
> previous one).

I'm not married to the idea of __reversed__ but think it should probably
be kept (if my intuition is off on this one, we can pull it out before
the beta release).

On the plus side:

* Many of the original posters either specifically requested this or
included some variation of it in their proposals.

* There is a small group (including Jeremy Fincher) that consider a
reversal protocol to be essential.

* It is particularly useful for xrange() because it reduces the overhead
to zero without touching the API.  The implementation patch on SF shows
that this can be done cleanly.  Essentially, __reverse__ forwards the
call to __iter__ with the arguments rearranged for reverse order.

* It leaves open the possibility that someone could add __reverse__ to
file objects, enabling them loop in reverse (helpful in reviewing log
files for example).

* There is a small group that passionately wants reverse() to work with
enumerate() and Alex appears to be close to figuring out how to overcome
the implementation challenges.

* The iter/__iter__ pair neatly parallels reversed/__reversed__.

* It is pythonic to put hooks in for just about everything.  Sooner or
later, someone needs the hook.  For everyone else, it's invisible.


On the minus side:

* I think you got cold feet when some poster presented a wacky or
misguided use for it.  There's no avoiding that; even Alex's dirt simple
__copy__ protocol can be turned into an atrocity by someone so inclined.


>  About your last bullet, I wonder if one of the
> issues is that when doing a forward loop over a container, we don't
> really care that much about the order as long as we get all items
> (witness the popularity of looping over dicts).  But when doing a
> reverse loop, we clearly *do* care about the order.  So forward and
> reverse iteration are not symmetric.  This may explains why 3 out of 5
> examples you found *need* the index.

Incisive analysis.


are-your-feet-feeling-warmer-now-ly yours,


Raymond Hettinger


From fincher.8 at osu.edu  Wed Nov  5 17:00:31 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Wed Nov  5 16:02:09 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051656.21014.aleaxit@yahoo.com>
References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer>
	<200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com>
	<200311051656.21014.aleaxit@yahoo.com>
Message-ID: <200311051700.31096.fincher.8@osu.edu>

On Wednesday 05 November 2003 10:56 am, Alex Martelli wrote:
> def reversed(sequence):
>     for x in xrange(len(sequence)-1, -1, -1): yield sequence[x]
>
> no __reversed__, no complications, "no nuttin'".
>
> Putting that in the current 2.4 pre-alpha will let us start getting some
> experience with it and see if in the future we want to add refinements
> (always easier to add than to remove...:-) -- either to reverse or to
> other iterator-returning calls (e.g. reverse= optional arguments just
> like in the sort method of lists).

It seems like a perfect candidate for that "tools" hierarchy you proposed 
before.  As a builtin, I'd be surprised if it saw significant use.

Jeremy

From jeremy at alum.mit.edu  Wed Nov  5 16:06:40 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed Nov  5 16:09:50 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <200311051523.hA5FN9r29272@12-236-54-216.client.attbi.com>
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<2m7k2f0vui.fsf@starship.python.net>
	<200311051545.11246.aleaxit@yahoo.com>
	<200311051523.hA5FN9r29272@12-236-54-216.client.attbi.com>
Message-ID: <1068066399.26328.23.camel@localhost.localdomain>

On Wed, 2003-11-05 at 10:23, Guido van Rossum wrote:
> > Removing _any_ built-in that was around in 1.5.2 will pose similar
> > problems.
> 
> Only proportional to the likelihood that it was used in 1.5.2, which
> is proportional to how useful it is.  intern(): extremely unlikely
> (nobody knows what it's for); coerce(): rather unlikely (too
> advanced); apply(): very likely.

The solution is to get people to stop using 1.5.2.  I don't entirely
understand why so many people write new code that needs to work with it.

Jeremy


From python at rcn.com  Wed Nov  5 16:22:35 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 16:22:51 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer>
Message-ID: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer>

> >The __reversed__ protocol muddles the issue by inviting to
> > try to make reversed() work for some iterators

The invitation is to add efficient reverse iteration support to regular
objects and user defined classes, not for iterators.  Though I won't be
suprised if someone tries, the only iterator that has a chance with this
is enumerate, but that is not what the hook is for.


Raymond


From pedronis at bluewin.ch  Wed Nov  5 16:44:59 2003
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Wed Nov  5 16:43:22 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051914.52326.aleaxit@yahoo.com>
References: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
	<5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
Message-ID: <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch>

At 19:14 05.11.2003 +0100, Alex Martelli wrote:
>On Wednesday 05 November 2003 05:34 pm, Samuele Pedroni wrote:
>    ...
> > I think he was wondering whether people rely on
> >
> > enumerate([1,2]).next
> > i = enumerate([1,2])
> > i is iter(i)
> >
> > working , vs. needing iter(enumerate([1,2]).next
> >
> > I think he was proposing to implement enumerate as
> >
> > class enumerate(object):
> >    def __init__(self,iterable):
> >      self.iterable = iterable
> >
> >    def __iter__(self):
> >      i = 0
> >      for x in self.iterable:
> >        yield i,x
> >        i += 1
> >
> >    def __reversed__(self):
> >      rev = reversed(self.iterable)
> >      try:
> >        i = len(self.iterable)-1
> >      except (TypeError,AttributeError):
> >        i = -1
> >      for x in rev:
> >         yield i,x
> >         i -= 1
>
>Ah, I see -- thanks!  Well, in theory you COULD add a 'next' method too:
>
>       def next(self):
>           self.iterable = iter(self.iterable)
>           try: self.index += 1
>           except AttributeError: self.index = 0
>           return self.index, self.iterable.next()
>
>(or some reasonable optimization thereof:-) -- now __reversed__ would stop
>working after any .next call, but that would still be OK for all use cases I
>can think of.

well, you would also get an iterator hybrid that violates:

"""
Iterator objects also need to implement this method [__iter__]; they are 
required to return themselves.
"""

http://www.python.org/doc/2.3.2/ref/sequence-types.html#l2h-234

what one could do is:

  class enumerate(object):
    def __init__(self,iterable):
      self.iterable = iterable
       self.forward = None
       self.index = 0

     def __iter__(self):
       return self

    def next(self):
        if not self.forward:
           self.forward = iter(self.iterable)
        i = self.index
        self.index += 1
       return i, self.forward.next()

    def __reversed__(self):
      if self.forward:
         raise Exception,...

     rev = reversed(self.iterable)
     try:
        i = len(self.iterable)-1
     except (TypeError,AttributeError):
        i = -1
     for x in rev:
        yield i,x
        i -= 1

but is still an hybrid, setting a bad precedent of trying too hard to 
attach __reversed__ to an iterator, making enumerate just an iterable is 
not backward compatible but is a bit saner although it does not feel that 
natural either.

regards. 


From guido at python.org  Wed Nov  5 17:21:12 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 17:22:18 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 16:22:35 EST."
	<004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> 
References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> 
Message-ID: <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com>

> > >The __reversed__ protocol muddles the issue by inviting to
> > > try to make reversed() work for some iterators
> 
> The invitation is to add efficient reverse iteration support to regular
> objects and user defined classes, not for iterators.  Though I won't be
> suprised if someone tries, the only iterator that has a chance with this
> is enumerate, but that is not what the hook is for.

Yeah, but there was widespread misunderstanding here (for a while even
you and Alex were convinced that it was possible for enumerate).

Several functions in itertools could easily be made to support
__reversed__ *if* their argument supports it (or even if not in one
case):

chain(*iterables) -- you can define reversed(chain(*iterables)) as
follows:

   for it in reversed(iterables):
       for element in reversed(it):
           yield element

cycle(iterable) -- this one is infinite but reversed(cycle(x)) could be
defined as cycle(reversed(x)).

ifilter(pred, it) -- again, it's easy to define reversed(ifilter(P, X))
as ifilter(P, reversed(X)).  Ditto for ifilterfalse.

imap() -- this would not be so easy because the iterables might not be
of equal length, so you can't map reversed(imap(F, X, Y)) to imap(F,
reversed(X), reversed(Y)).  But for a single sequence, again it could be
done.

islice() -- seems easy enough.

starmap() -- simple, this is like imap() with a single argument.

repeat() -- trivial!  reversed(repeat(X[, N])) == repeat(X[, N]).

dropwhile(), takewhile(), count() aren't amenable.

So, unless you want to open this can of worms, I'd be for a version of
reversed() that does *not* support __reversed__, making it perfectly
clear it only applies to real sequences.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From DavidA at ActiveState.com  Wed Nov  5 17:16:24 2003
From: DavidA at ActiveState.com (David Ascher)
Date: Wed Nov  5 17:22:43 2003
Subject: [Python-Dev] closure semantics
In-Reply-To: <200310220158.21389.aleaxit@yahoo.com>
References: <Your message of "Tue, 21 Oct 2003 23:33:30
	+0200."	<5.2.1.1.0.20031021232738.027c3e38@pop.bluewin.ch>	<200310220121.52789.aleaxit@yahoo.com>	<200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com>
	<200310220158.21389.aleaxit@yahoo.com>
Message-ID: <3FA976B8.9070806@ActiveState.com>

Alex Martelli wrote:

>So it can't be global, as it must stay a keyword for backwards compatibility
>at least until 3.0.  
>
Why?  Removing keywords should be much simpler than adding them.  I have
no idea how hard it is to hack the parser to adjust, but I can't imagine how
having 'global' no longer be a keyword as far as its concerned break b/w
compatibility.

What am I missing?


From guido at python.org  Wed Nov  5 17:27:43 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 17:27:51 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 22:44:59 +0100."
	<5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> 
References: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
	<5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com>
	<5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> 
	<5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> 
Message-ID: <200311052227.hA5MRhn29980@12-236-54-216.client.attbi.com>

> but is still an hybrid, setting a bad precedent of trying too hard to 
> attach __reversed__ to an iterator, making enumerate just an iterable is 
> not backward compatible but is a bit saner although it does not feel that 
> natural either.

Exactly.  All I've heard is that some folks asked for __reversed__.  I
haven't heard any convincing use cases; the PEP doesn't have any.  The
only motivation in the PEP is this:

"""
  Custom Reverse

  Objects may optionally provide a __reversed__ method that returns a
  custom reverse iterator.

  This allows reverse() to be applied to objects that do not have
  __getitem__() and __len__() but still have some useful way of
  providing reverse iteration.
"""

To me, this just *begs* for attempts to add __reversed__ to all sorts
of things (including iterators) that aren't sequences.

If the real use case is to speed up performance, I'd like to see a
discussion of the attainable speed gain, and I'd like to see the
absence of __getitem__ / __len__ removed from the motivation.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Nov  5 17:29:32 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 17:29:39 2003
Subject: [Python-Dev] closure semantics
In-Reply-To: Your message of "Wed, 05 Nov 2003 14:16:24 PST."
	<3FA976B8.9070806@ActiveState.com> 
References: <Your message of "Tue, 21 Oct 2003 23:33:30 +0200."
	<5.2.1.1.0.20031021232738.027c3e38@pop.bluewin.ch>
	<200310220121.52789.aleaxit@yahoo.com>
	<200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com>
	<200310220158.21389.aleaxit@yahoo.com> 
	<3FA976B8.9070806@ActiveState.com> 
Message-ID: <200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com>

> Alex Martelli wrote:
> 
> >So it can't be global, as it must stay a keyword for backwards
> >compatibility at least until 3.0.

[David]
> Why?  Removing keywords should be much simpler than adding them.  I
> have no idea how hard it is to hack the parser to adjust, but I
> can't imagine how having 'global' no longer be a keyword as far as
> its concerned break b/w compatibility.
> 
> What am I missing?

I don't recall the context, but I think the real issue with removing
'global' is that there's too much code out there that uses the global
syntax to remove the global statement before 3.0.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Wed Nov  5 17:34:23 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 17:34:32 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch>
References: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch>
	<5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch>
Message-ID: <200311052334.23547.aleaxit@yahoo.com>

On Wednesday 05 November 2003 22:44, Samuele Pedroni wrote:
   ...
> > > I think he was wondering whether people rely on
> > >
> > > enumerate([1,2]).next

...which is one thing...

> > > i = enumerate([1,2])
> > > i is iter(i)

...which is another.


> >Ah, I see -- thanks!  Well, in theory you COULD add a 'next' method too:

Note I specifically didn't say "make enumerate return an iterator" -- I said
"add a 'next' method".  It's a non-special name (be that right or wrong) and
thus there is no prohibition against non-iterators having such a method.


> well, you would also get an iterator hybrid that violates:

No you wouldn't -- you would get a non-iterator type which exposes a
method named 'next', and that violates no Python rule.

> attach __reversed__ to an iterator, making enumerate just an iterable is
> not backward compatible but is a bit saner although it does not feel that
> natural either.

If anybody relies on that "i is iter(i)" then, yes.  I have never seen that
relied upon.  I _have_ seen quite a few cases of reliance on calls to a
'next' method to "throw the first item away" (no doubt a call to iter(...)
first would be preferable, but I'm just mentioning what I've seen).

I'm not sure supporting dubious "happens to work" existing usage is
_desirable_ -- I'm just saying it's _possible_ (in some cases, such as
this one) without necessarily violating anything.

Personally, since I found out that enumerate(reversed(x)) works almost
as well as reversed(enumerate(x)) [[or other hypotheticals -- such as
enumerate(x, reverse=True) OR reversed(x, enumerate=True)]], and 
better than revrange(len(x)), for my use cases, I'm not particularly pro
NOR con wrt __reversed__ -- its pluses (which Raymond summarizes
quite well) and its minuses (Guido's worry about it promoting unwarranted
complications, my vague unease at "yet another special-case protocol
via a special-method when adaptation would handle it more uniformly")
are finely balanced.  I just hope that, either with or without __reversed__,
reversed _does_ get in, at least, as Guido pointed out, tentatively (since
features, if need be, may be withdrawn before the beta phase).


Alex


From guido at python.org  Wed Nov  5 17:35:01 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 17:36:07 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 15:54:22 EST."
	<003701c3a3de$fdc7c1e0$e841fea9@oemcomputer> 
References: <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer> 
Message-ID: <200311052235.hA5MZ1F30026@12-236-54-216.client.attbi.com>

> I'm not married to the idea of __reversed__ but think it should
> probably be kept (if my intuition is off on this one, we can pull it
> out before the beta release).

Let's do it the other way around -- let's not add a complication until
we have further proof it is needed.  Remember YAGNI. :-)

> On the plus side:
> 
> * Many of the original posters either specifically requested this or
> included some variation of it in their proposals.

If any of them gave a good motivation or use case, those didn't make
it into the PEP.

> * There is a small group (including Jeremy Fincher) that consider a
> reversal protocol to be essential.

And I think that as a protocol it needs a separate PEP, because new
protocols are much more involved than new builtins.

> * It is particularly useful for xrange() because it reduces the overhead
> to zero without touching the API.  The implementation patch on SF shows
> that this can be done cleanly.  Essentially, __reverse__ forwards the
> call to __iter__ with the arguments rearranged for reverse order.

The implementation could special-case xrange() and lists and "optimize
the snot out of them" without the need for a general protocol.

> * It leaves open the possibility that someone could add __reverse__ to
> file objects, enabling them loop in reverse (helpful in reviewing log
> files for example).

That's exactly the danger.  Such a thing is much better coded as a
separate object rather than adding it to the base file object.

> * There is a small group that passionately wants reverse() to work with
> enumerate() and Alex appears to be close to figuring out how to overcome
> the implementation challenges.

Doubtful.

> * The iter/__iter__ pair neatly parallels reversed/__reversed__.

The parallel is a fallacy (see one of my previous postys about the
asymmetry).

> * It is pythonic to put hooks in for just about everything.  Sooner or
> later, someone needs the hook.  For everyone else, it's invisible.

But hook design is harder than builtin design.

> On the minus side:
> 
> * I think you got cold feet when some poster presented a wacky or
> misguided use for it.  There's no avoiding that; even Alex's dirt simple
> __copy__ protocol can be turned into an atrocity by someone so inclined.

The __copy__ protocol is limited in practice to the expectations and
promises of the copy module.  The problem with __reversed__ is that
everyone thinks it means what *they* would like to see.

> are-your-feet-feeling-warmer-now-ly yours,

No, this is one of the coldest weeks since my mvoe to Calif. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Wed Nov  5 18:02:29 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov  5 18:02:36 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com>
References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer>
	<200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com>
Message-ID: <200311060002.29814.aleaxit@yahoo.com>

On Wednesday 05 November 2003 23:21, Guido van Rossum wrote:
> > > >The __reversed__ protocol muddles the issue by inviting to
> > > > try to make reversed() work for some iterators
> >
> > The invitation is to add efficient reverse iteration support to regular
> > objects and user defined classes, not for iterators.  Though I won't be
> > suprised if someone tries, the only iterator that has a chance with
> > this is enumerate, but that is not what the hook is for.
>
> Yeah, but there was widespread misunderstanding here (for a while even
> you and Alex were convinced that it was possible for enumerate).

It _is_ *possible*; it is not necessarily _opportune_ -- a different issue.
Similarly, you point out below possibilities that may not be opportune.


> So, unless you want to open this can of worms, I'd be for a version of
> reversed() that does *not* support __reversed__, making it perfectly
> clear it only applies to real sequences.

Unless some _opportune_ (i.e., truly good:-) use case of "naturally 
reversible nonsequence" (doubly linked list...?-) arises (and the
__reversed__ idea can inserted then -- just as it could be removed
if reversed started out with it -- as long as we do it before the beta)
reversed with or without __reversed__ seem anyway fine to me --
arguments being so finely balanced on both sides.


Alex


From guido at python.org  Wed Nov  5 18:08:42 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 18:08:49 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Thu, 06 Nov 2003 00:02:29 +0100."
	<200311060002.29814.aleaxit@yahoo.com> 
References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer>
	<200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com> 
	<200311060002.29814.aleaxit@yahoo.com> 
Message-ID: <200311052308.hA5N8gU30099@12-236-54-216.client.attbi.com>

> Unless some _opportune_ (i.e., truly good:-) use case of "naturally 
> reversible nonsequence" (doubly linked list...?-) arises (and the
> __reversed__ idea can inserted then -- just as it could be removed
> if reversed started out with it -- as long as we do it before the beta)
> reversed with or without __reversed__ seem anyway fine to me --
> arguments being so finely balanced on both sides.

It's more effort to add something later than to remove it (since
there's always *someone* who's already dependent on it), so I see the
argument about adding __reversed__ far from balanced.  I see at most a
5% chance that reversed() would be removed before 2.3b1.  If we add
__reversed__ now I doubt that we'll remove it (assuming reversed()
stays), but I still am unconvinced of the need (and I *am* convinced
of the danger).

So:

- I am +1 on adding reversed() provisionally
- I am -1 on adding __reversed__ at the same time

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Wed Nov  5 18:26:12 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed Nov  5 18:26:23 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <002301c3a39d$36d00020$e841fea9@oemcomputer>
Message-ID: <200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz>

[Neal Norwitz]
> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.

In the case of intern, do you mean to move it into
a module, or remove it altogether? 

If the latter, why?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From bsder at allcaps.org  Wed Nov  5 19:26:12 2003
From: bsder at allcaps.org (Andrew P. Lentvorski, Jr.)
Date: Wed Nov  5 19:25:49 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com>
References: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> 
	<200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com>
Message-ID: <20031105161052.W14642@mail.allcaps.org>

On Wed, 5 Nov 2003, Guido van Rossum wrote:

> I'm okay with adding reversed() as a builtin that works for sequences
> only but I'm not okay with adding the __reversed__ protocol.

But, doesn't this effectively take the PEP back to the original proposal
of a sequence method that it drifted away from?

With the restriction to sequences, reversed() is then likely to be
implemented as a thin wrapper around seq.somerevmethod() which could then
return either a new reversed sequence, an iterable, or an iterator
depending upon efficiency, implementation, thread-safety, etc.

Since reversed() is turning out not be generally applicable anyway,
perhaps going back to the original idea of a sequence method would be a
good thing?

-a

From guido at python.org  Wed Nov  5 19:30:56 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov  5 19:31:02 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 16:26:12 PST."
	<20031105161052.W14642@mail.allcaps.org> 
References: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer>
	<200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com> 
	<20031105161052.W14642@mail.allcaps.org> 
Message-ID: <200311060030.hA60UuP30210@12-236-54-216.client.attbi.com>

> > I'm okay with adding reversed() as a builtin that works for sequences
> > only but I'm not okay with adding the __reversed__ protocol.
> 
> But, doesn't this effectively take the PEP back to the original proposal
> of a sequence method that it drifted away from?

No, because making it a sequence method would require every sequence
implementation to support it.  Making it a builtin makes it work for
all sequences (everything that supports __len__ and __getitem__ with
random access, really).

> With the restriction to sequences, reversed() is then likely to be
> implemented as a thin wrapper around seq.somerevmethod() which could then
> return either a new reversed sequence, an iterable, or an iterator
> depending upon efficiency, implementation, thread-safety, etc.

No.  reversed() should *never* return a new sequence; it should return
an iterator.

> Since reversed() is turning out not be generally applicable anyway,
> perhaps going back to the original idea of a sequence method would be a
> good thing?

No.  The feedback on that was pretty uniformly negative.  The PEP is
95% about reversed() on sequences and only a timy bit about
__reversed__, so little is lost.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From anthony at interlink.com.au  Wed Nov  5 22:15:46 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Nov  5 22:18:59 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch 
In-Reply-To: <200311051458.hA5EwWc29153@12-236-54-216.client.attbi.com> 
Message-ID: <200311060315.hA63Fndh000543@localhost.localdomain>


>>> Guido van Rossum wrote
> This warning will go away in 2.4 again, where %x with a negative int
> will return a hex number with a minus sign.  So I'd be against
> introducing a new format code.  I've forgotten in what code you found
> this, but the sys.maxint solution sounds like your best bet.  In 2.4
> we can also make id() return a long when the int value would be
> negative; I don't want to do that in 2.3 since changing the return
> type and value of a builtin in a minor release seems a compatibility
> liability -- but in 2.4 the difference between int and long will be
> wiped out even more than it already is, so it should be fine there.

The code is basically something like this:

Python 2.3.2+ (#1, Nov  5 2003, 00:54:02) 
[GCC 3.3.1 20030930 (Red Hat Linux 3.3.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class a: pass
... 
>>> b=a()
>>> repr(b) == '<__main__.a instance at 0x%x>'%id(b)
__main__:1: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up
True
>>> 

For now, I'll patch the 2.3 code in the test suite to make it not complain.

If %x will return a negative hex number, then the internals of id() must make
sure that they return a positive number, or whatever does the standard repr will
need to change as well. I'll log a bug on SF for it.

Anthony

From python at rcn.com  Wed Nov  5 22:25:19 2003
From: python at rcn.com (Raymond Hettinger)
Date: Wed Nov  5 22:25:29 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311052235.hA5MZ1F30026@12-236-54-216.client.attbi.com>
Message-ID: <002501c3a415$9b06f9e0$e841fea9@oemcomputer>

> > * There is a small group (including Jeremy Fincher) that consider a
> > reversal protocol to be essential.
> 
> And I think that as a protocol it needs a separate PEP, because new
> protocols are much more involved than new builtins.

Great idea.  The champions for __reverse__ can plead their case there.

> > * It is particularly useful for xrange() because it reduces the
overhead
> > to zero without touching the API.  The implementation patch on SF
shows
> > that this can be done cleanly.  Essentially, __reverse__ forwards
the
> > call to __iter__ with the arguments rearranged for reverse order.
> 
> The implementation could special-case xrange() and lists and "optimize
> the snot out of them" without the need for a general protocol.

Agreed!  I'll take __reversed__ out of the pep.

May I mark this one as accepted and move on?


Raymond Hettinger


From DavidA at ActiveState.com  Wed Nov  5 22:44:43 2003
From: DavidA at ActiveState.com (David Ascher)
Date: Wed Nov  5 22:36:02 2003
Subject: [Python-Dev] closure semantics
In-Reply-To: <200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com>
References: <Your message of "Tue, 21 Oct 2003 23:33:30 +0200."
	<5.2.1.1.0.20031021232738.027c3e38@pop.bluewin.ch>
	<200310220121.52789.aleaxit@yahoo.com>
	<200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com>
	<200310220158.21389.aleaxit@yahoo.com>
	<3FA976B8.9070806@ActiveState.com>
	<200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com>
Message-ID: <3FA9C3AB.808@ActiveState.com>

Guido van Rossum wrote:

[Alex]
>>>So it can't be global, as it must stay a keyword for backwards
>>>compatibility at least until 3.0.

[David]
>>Why?  Removing keywords should be much simpler than adding them.  I
>>have no idea how hard it is to hack the parser to adjust, but I
>>can't imagine how having 'global' no longer be a keyword as far as
>>its concerned break b/w compatibility.
>>
>>What am I missing?

[GvR]
> I don't recall the context, but I think the real issue with removing
> 'global' is that there's too much code out there that uses the global
> syntax to remove the global statement before 3.0.

I would never have suggested that.  Just that we can evolve the parser
to retain the old usage

	global a,b,c

while allowing a new usage

	global.a = value

by removing 'global' from the list of reserved words and doing "fancy
stuff" in the parser.  Note that I very much don't know the details
of the "fancy stuff".

--david


From neal at metaslash.com  Wed Nov  5 22:58:37 2003
From: neal at metaslash.com (Neal Norwitz)
Date: Wed Nov  5 22:58:47 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz>
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz>
Message-ID: <20031106035837.GB7212@epoch.metaslash.com>

On Thu, Nov 06, 2003 at 12:26:12PM +1300, Greg Ewing wrote:
> [Neal Norwitz]
> > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern.
> 
> In the case of intern, do you mean to move it into
> a module, or remove it altogether? 
> 
> If the latter, why?

For the most part, I meant to remove them (including intern)
altogether in the long run.  In 2.4, I only meant to officially
deprecate them with a warning.  intern() doesn't seem particularly
useful or commonly used.  At least moving it to sys or some other
module is an improvement IMO.

My primary goal in pushing to deprecate these older features 
is to make the language smaller.  A secondary goal is 
to reduce the code base, thus easing maintenance and testing.  
If a feature is not useful, in the long run, I think it should
be removed.  I agree there's pain involved.  But there's
also pain in keeping it.  Part of that pain, is that its
use get propagated.  Perhaps people that teach Python and
write books can speak to this better than I.

This idea leads to Jeremy's statement:

        The solution is to get people to stop using 1.5.2.  I don't
        entirely understand why so many people write new code that
        needs to work with it.

If we never deprecate/threaten to remove a feature, people will
continue to use it.  But that becomes a circular argument for
why we can't deprecate/remove it.  How long should we wait
from the time a feature is not needed until it is removed?

Here's the documentation release dates from the doc web page
(http://python.org/doc/versions.html):

        2.3   29 Jul 2003
        2.2   21 Dec 2001
        2.1   15 Apr 2001
        2.0   16 Oct 2000
        1.5.2 30 Apr 1999

By the time 2.4 is released (likely mid-2004 at the earliest),
apply() will have been made redundant for about 4 years
(since 2.0 was released).  All we are talking about
is adding a warning for 2.4.  I'm not sure whether it is 
appropriate to remove apply() in 2.5 (delivered in 2005-2006?).  
But if we don't work towards cleaning up, it will never get done.

I also have no problem adding a module for backwards compatibility
that adds apply(), etc to builtins.  In fact, I think this is 
a better approach that if someone wants to "port" their code
from 1.5.2 to 2.4, they can acheive much of it by adding:

        import python1_5_2_compatibility

which does some magic.

I also think the reverse is true.  For new builtins, it would
be nice to provide a compatibility module that can be downloaded
for older versions.  That way I can use sum(), enumerate(), etc
in 2.2 and before.

Neal

From bac at OCF.Berkeley.EDU  Wed Nov  5 23:22:45 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Wed Nov  5 23:22:54 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: <200311052308.hA5N8gU30099@12-236-54-216.client.attbi.com>
References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer>	<200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com>
	<200311060002.29814.aleaxit@yahoo.com>
	<200311052308.hA5N8gU30099@12-236-54-216.client.attbi.com>
Message-ID: <3FA9CC95.6010809@ocf.berkeley.edu>

Guido van Rossum wrote:
>>Unless some _opportune_ (i.e., truly good:-) use case of "naturally 
>>reversible nonsequence" (doubly linked list...?-) arises (and the
>>__reversed__ idea can inserted then -- just as it could be removed
>>if reversed started out with it -- as long as we do it before the beta)
>>reversed with or without __reversed__ seem anyway fine to me --
>>arguments being so finely balanced on both sides.
> 
> 
> It's more effort to add something later than to remove it (since
> there's always *someone* who's already dependent on it), so I see the
> argument about adding __reversed__ far from balanced.  I see at most a
> 5% chance that reversed() would be removed before 2.3b1.  If we add
> __reversed__ now I doubt that we'll remove it (assuming reversed()
> stays), but I still am unconvinced of the need (and I *am* convinced
> of the danger).
> 
> So:
> 
> - I am +1 on adding reversed() provisionally
> - I am -1 on adding __reversed__ at the same time
> 

Been following this from afar (crazy week with homework; fun).  In case 
anyone cares about my opinion:

+0 on reversed(): wouldn't hurt having it but I still don't see it as 
critical enough to be a built-in
-1 on __reversed__: I like my iterator protocol **simple**.

OK, back to studying for my midterm.

-Brett


From tdelaney at avaya.com  Wed Nov  5 23:31:03 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Wed Nov  5 23:31:09 2003
Subject: [Python-Dev] Deprecating obsolete builtins
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF617A@au3010avexu1.global.avaya.com>

> From: Neal Norwitz [mailto:neal@metaslash.com]
> 
> For the most part, I meant to remove them (including intern)
> altogether in the long run.  In 2.4, I only meant to officially
> deprecate them with a warning.  intern() doesn't seem particularly
> useful or commonly used.  At least moving it to sys or some other
> module is an improvement IMO.

One reason why intern() hasn't been commonly used is that it made things immortal. This is no longer the case - I'd like to see if the use of intern() changes.

What I would prefer would be for intern() to be able to take any hashable object - in particular, tuples. It's not uncommon for me to create lots of small tuples which end up having the same data in them - interning could save quite a bit of memory.

Yes, I can fake it with my own interning function, but that then means I have to deal with the immortality problems again.

So I'd actually advocate enhancing intern(), rather than removing it, now that interned things are mortal.

Tim Delaney

From aahz at pythoncraft.com  Wed Nov  5 23:55:45 2003
From: aahz at pythoncraft.com (Aahz)
Date: Wed Nov  5 23:55:47 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF617A@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF617A@au3010avexu1.global.avaya.com>
Message-ID: <20031106045545.GA20099@panix.com>

On Thu, Nov 06, 2003, Delaney, Timothy C (Timothy) wrote:
> From: Neal Norwitz [mailto:neal@metaslash.com]
>> 
>> For the most part, I meant to remove them (including intern)
>> altogether in the long run.  In 2.4, I only meant to officially
>> deprecate them with a warning.  intern() doesn't seem particularly
>> useful or commonly used.  At least moving it to sys or some other
>> module is an improvement IMO.
> 
> One reason why intern() hasn't been commonly used is that it made
> things immortal. This is no longer the case - I'd like to see if the
> use of intern() changes.
>
> What I would prefer would be for intern() to be able to take any
> hashable object - in particular, tuples. It's not uncommon for me to
> create lots of small tuples which end up having the same data in them
> - interning could save quite a bit of memory.
>
> Yes, I can fake it with my own interning function, but that then means
> I have to deal with the immortality problems again.
>
> So I'd actually advocate enhancing intern(), rather than removing it,
> now that interned things are mortal.

Agreed.  But intern() should *not* be a builtin function.  It belongs in
sys.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From tdelaney at avaya.com  Thu Nov  6 00:13:39 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Thu Nov  6 00:13:45 2003
Subject: [Python-Dev] Deprecating obsolete builtins
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF619C@au3010avexu1.global.avaya.com>

> From: Aahz [mailto:aahz@pythoncraft.com]
> >
> > So I'd actually advocate enhancing intern(), rather than 
> > removing it, now that interned things are mortal.
> 
> Agreed.  But intern() should *not* be a builtin function.  It 
> belongs in sys.

Hmm - not so sure about sys, but I agree it could quite well be moved out of builtins.

I don't feel it belongs in sys because it has nothing to do with the environment that python is running in. Instead it has to do with object management.

Tim Delaney

From guido at python.org  Thu Nov  6 00:30:51 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov  6 00:31:01 2003
Subject: [Python-Dev] closure semantics
In-Reply-To: Your message of "Wed, 05 Nov 2003 19:44:43 PST."
	<3FA9C3AB.808@ActiveState.com> 
References: <Your message of "Tue, 21 Oct 2003 23:33:30 +0200."
	<5.2.1.1.0.20031021232738.027c3e38@pop.bluewin.ch>
	<200310220121.52789.aleaxit@yahoo.com>
	<200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com>
	<200310220158.21389.aleaxit@yahoo.com>
	<3FA976B8.9070806@ActiveState.com>
	<200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com> 
	<3FA9C3AB.808@ActiveState.com> 
Message-ID: <200311060530.hA65Ups30577@12-236-54-216.client.attbi.com>

> [Alex]
> >>>So it can't be global, as it must stay a keyword for backwards
> >>>compatibility at least until 3.0.
> 
> [David]
> >>Why?  Removing keywords should be much simpler than adding them.  I
> >>have no idea how hard it is to hack the parser to adjust, but I
> >>can't imagine how having 'global' no longer be a keyword as far as
> >>its concerned break b/w compatibility.
> >>
> >>What am I missing?
> 
> [GvR]
> > I don't recall the context, but I think the real issue with removing
> > 'global' is that there's too much code out there that uses the global
> > syntax to remove the global statement before 3.0.
> 
[David]
> I would never have suggested that.  Just that we can evolve the parser
> to retain the old usage
> 
> 	global a,b,c
> 
> while allowing a new usage
> 
> 	global.a = value
> 
> by removing 'global' from the list of reserved words and doing "fancy
> stuff" in the parser.  Note that I very much don't know the details
> of the "fancy stuff".

Ah.  *If* we want to parse both it would be easier to keep global as a
keyword and do fancy stuff to recognize the second form...

But I think somewhere in the mega-thread about this topic is hidden
the conclusion that there are better ways to do this.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Nov  6 00:33:48 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov  6 00:33:59 2003
Subject: [Python-Dev] Re: PEP 322: Reverse Iteration
In-Reply-To: Your message of "Wed, 05 Nov 2003 22:25:19 EST."
	<002501c3a415$9b06f9e0$e841fea9@oemcomputer> 
References: <002501c3a415$9b06f9e0$e841fea9@oemcomputer> 
Message-ID: <200311060533.hA65XmF30630@12-236-54-216.client.attbi.com>

> Agreed!  I'll take __reversed__ out of the pep.
> 
> May I mark this one as accepted and move on?

Yes.  Just mark it as "conditionally accepted" (meaning that if we
find it useless after all we can remove it before 2.3b1 -- you can
make that condition explicit).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Nov  6 00:39:58 2003
From: barry at python.org (Barry Warsaw)
Date: Thu Nov  6 00:40:21 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <20031106035837.GB7212@epoch.metaslash.com>
References: <002301c3a39d$36d00020$e841fea9@oemcomputer>
	<200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz>
	<20031106035837.GB7212@epoch.metaslash.com>
Message-ID: <1068097197.13655.0.camel@anthem>

On Wed, 2003-11-05 at 22:58, Neal Norwitz wrote:

> I also have no problem adding a module for backwards compatibility
> that adds apply(), etc to builtins.  In fact, I think this is 
> a better approach that if someone wants to "port" their code
> from 1.5.2 to 2.4, they can acheive much of it by adding:
> 
>         import python1_5_2_compatibility

from __past__ import cruft

<1.6.1 wink>

-Barry


From guido at python.org  Thu Nov  6 00:41:17 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov  6 00:41:34 2003
Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch
In-Reply-To: Your message of "Thu, 06 Nov 2003 14:15:46 +1100."
	<200311060315.hA63Fndh000543@localhost.localdomain> 
References: <200311060315.hA63Fndh000543@localhost.localdomain> 
Message-ID: <200311060541.hA65fHQ30649@12-236-54-216.client.attbi.com>

> If %x will return a negative hex number, then the internals of id()
> must make sure that they return a positive number, or whatever does
> the standard repr will need to change as well. I'll log a bug on SF
> for it.

The standard repr is written in C and uses %p, which does a platform
specific thing, but typically produces an unsigned hex number of
appropriate length; apparently we've not been ported to platforms
where it does something else, otherwise the test would have failed
there too.  One can argue that the test is too constrained anyway --
why should we care about the specific hex number in the repr() of a
class?

I'm not for adding %p to Python's string formats; it's too
implementation specific and I don't see a use for it other than
matching the built-in repr().

id() has always returned negative numbers on all platforms where
pointers happen to have the high bit set; apart from making this test
pass in the future (which is a pretty weak argument) I don't see a
problem with that, so I'm not in favor of changing it, even though it
would be easy enough to change PyLong_FromVoidPtr() to call
PyLong_FromLong[Long]().

--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at python.net  Thu Nov  6 05:31:11 2003
From: theller at python.net (Thomas Heller)
Date: Thu Nov  6 05:31:35 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib
 libtraceback.tex, 1.17, 1.18
In-Reply-To: <E1AHWgV-0007Bi-00@sc8-pr-cvs1.sourceforge.net>
	(nascheme@users.sourceforge.net's
	message of "Wed, 05 Nov 2003 15:03:31 -0800")
References: <E1AHWgV-0007Bi-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <ad797pqo.fsf@python.net>

nascheme@users.sourceforge.net writes:

> Update of /cvsroot/python/python/dist/src/Doc/lib
> In directory sc8-pr-cvs1:/tmp/cvs-serv27582/Doc/lib
>
> Modified Files:
> 	libtraceback.tex 
> Log Message:
> Add traceback.format_exc().
>
>
> Index: libtraceback.tex
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Doc/lib/libtraceback.tex,v
> retrieving revision 1.17
> retrieving revision 1.18
> diff -C2 -d -r1.17 -r1.18
> *** libtraceback.tex	30 Jan 2003 22:22:59 -0000	1.17
> --- libtraceback.tex	5 Nov 2003 23:02:58 -0000	1.18
> ***************
> *** 49,52 ****
> --- 49,57 ----
>   \end{funcdesc}
>   
> + \begin{funcdesc}{format_exc}{\optional{limit\optional{, file}}}
> + This is like \code{print_exc(\var{limit})} but returns a string
> + instead of printing to a file.
> + \end{funcdesc}
> + 

Shouldn't there be a 'new in Python 2.4' note here? I don't remember how
this is spelled in LaTeX.

Thomas


From mwh at python.net  Thu Nov  6 07:09:40 2003
From: mwh at python.net (Michael Hudson)
Date: Thu Nov  6 07:09:47 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib
	libtraceback.tex, 1.17, 1.18
In-Reply-To: <ad797pqo.fsf@python.net> (Thomas Heller's message of "Thu, 06
	Nov 2003 11:31:11 +0100")
References: <E1AHWgV-0007Bi-00@sc8-pr-cvs1.sourceforge.net>
	<ad797pqo.fsf@python.net>
Message-ID: <2moevpzojf.fsf@starship.python.net>

Thomas Heller <theller@python.net> writes:

> nascheme@users.sourceforge.net writes:
>> + \begin{funcdesc}{format_exc}{\optional{limit\optional{, file}}}
>> + This is like \code{print_exc(\var{limit})} but returns a string
>> + instead of printing to a file.
>> + \end{funcdesc}
>> + 
>
> Shouldn't there be a 'new in Python 2.4' note here? I don't remember how
> this is spelled in LaTeX.

\versionadded{2.4}

-- 
  I also fondly recall Paris because that's where I learned to
  debug Zetalisp while drunk.                          -- Olin Shivers

From tdelaney at avaya.com  Thu Nov  6 16:46:48 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Thu Nov  6 16:46:56 2003
Subject: [Python-Dev] Deprecating obsolete builtins
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com>

> From: Guido van Rossum [mailto:guido@python.org]
> 
> > > > So I'd actually advocate enhancing intern(), rather 
> than removing
> > > > it, now that interned things are mortal.
> > > 
> > > Have you thought about how to implement that?  And have 
> you calculated
> > > how much memory you would save?
> > 
> > Not yet - musings at the end of the day. As to how much memory - I
> > really don't think it can be calculated - it's so
> > application-dependent.
> 
> Well obviously I meant for *your* app, because you're the one bringing
> this up (I'm highly skeptical of the idea if you hadn't 
> guessed yet :-).

Moved back to python-dev because I've got some actual pseudocode in here ... ;)

I'll follow it up further when I've got a solid use case. I'm also skeptical of the idea, but think it's worth some additional thought. At the moment it's just gut feeling that if we're going to have it at all, it seems that it would be useful for things other than strings.

As for implementation ... something like:

    _INTERN_DICT = WeakKeyValueDictionary()

    def unrestrained_intern (obj):

        # Singletons don't need to be interned
        if obj is None or obj is True or obj is False:
            return obj

        try:
            return intern(obj)
        except TypeError:
            return _INTERN_DICT.setdefault(obj, obj)

    a = (1, 2, 3)
    b = (1, 2) + (3,)
    
    assert unrestrained_intern(a) is unrestrained_intern(b)

Of course, this would require that we could create a weak reference to hashable builtin types like tuple and int. The dictionary holding the objects would need to be weak on both key and value to ensure mortality.

Anyway, there's a lot of flow-on effects there :( and its very much in a fledgling concept phase at the moment.

Tim Delaney

From raymond.hettinger at verizon.net  Fri Nov  7 02:33:54 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Fri Nov  7 02:34:47 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
Message-ID: <000901c3a501$8fb10800$1535c797@oemcomputer>

Idea for the day:  Let the str.encode/decode methods accept keyword
arguments to be forwarded to the underlying codec.

For example, zlib_codec.py can then express its encoding function as:

def zlib_encode(input,errors='strict', **kwds):
    assert errors == 'strict'
    if 'level' in kwds:
        output = zlib.compress(input, kwds['level'])
    else:
        output = zlib.compress(input)
    return (output, len(input))

The user can then have access to zlib's optional compression level
argument:

>>> 'which witch has which witches wristwatch'.encode('zlib', level=9)
'x\x9c+\xcf\xc8L\xceP(\xcf,\x01\x92\x19\x89\xc5\n\xe5\x08~*\x90W\x94Y\\R
\x9e\x08\xe4\x00\x005\xe5\x0fi' 

This small extension to the protocol makes it possible to use codecs for
a wider variety of applications:

>>> msg = 'beware the ides of march'.encode('des', key=0x10ab03b78495d2)
>>> print msg.decode(('des', key=0x10ab03b78495d2)
beware the ides of march'

>>> template = '${name} was born in ${country}'
>>> print template.encode('pep292_codec', name='Guido',
country='Netherlands')
'Guido was born in the Netherlands'

A key advantage of extending the codec protocol is that new or
experimental services can easily be added or tried out without expanding
the API elsewhere.  For example, Barry's simpler string substitutions
can be implemented without adding a new string method to cook the text.

Already, the existing protocol has provided consistent, uniform access
to a variety of services:

    text.encode('quotedprintable')
    text.encode('rot13')
    text.encode('palmos')

The proposed extension allows this benefit to apply to an even broader
range of services.


Raymond Hettinger


From Boris.Boutillier at arteris.net  Fri Nov  7 07:24:35 2003
From: Boris.Boutillier at arteris.net (Boris Boutillier)
Date: Fri Nov  7 07:24:45 2003
Subject: [Python-Dev] Code to prevent modification on builtins classes also
 abusively (IMHO)
 prevents modifications on extensions modules, some ideas on this.
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com>
Message-ID: <3FAB8F03.50601@arteris.net>

I look into the archives and didn't see any debate on the question, hope 
I didn't miss something.

My point concerns limitations on extensions module due to checks aiming 
the builtins.
The main point is settable extension classes.
In Python code there is some checks against TPFLAGS_HEAPTYPE, extension 
modules should'nt have this flag, so the normal type->tp_setattro doesnt 
allow the user to
set new attributes on your extension classes. There is a way around, 
write a special MetaClass which redefine setattr.

In the extension module I'm writing (I'm porting some Python code to 
Python-C for speed issues) the user can set attributes and slots on my 
classes.
What I need is the complete type->tp_setattro behaviour, without the 
check. I didn't see a way to have this behaviour using only Python API 
(is rereadying the type a work around ?), so I copy paste all the code 
to make update_slots work (ouch 2500 lines).
This is now almost working, every kind of attribute can be set but the 
__setattr__ one, the hackcheck prevents the user from calling another 
__setattr__ from its new setattr:
example of my extension class hierachy:
Class A(object)
Class B(A)

In the extension, there is a tp->setattro on B, if the user want to 
redefine it, he can't call the A __setattr__:
def myBSetattr(self,k,v):
    super(B,self).__setattr__(k,v)
    ## Do here my special stuff
This won't work, the hachcheck will see some kind of hack here, 'you 
cant' call the A.__setattr__ function from a B object' :).

First question, Is there a known way around ?

Possible Improvments :

In the python code there is in function function checks to see if you 
are not modying builtins classes, unfortunately this code is also 
concerning extension modules.
I think the Heaptype flag is abusively used in differents cases mostly, 
in type_setattro, object_set_bases, object_set_classes, the checks have 
nothing to do with the HeapType true definition as stated in the 
comments in Include/Object.h , it is used, I think, only because this is 
the only one that makes a difference between builtins and user classes. 
Unfortunately with  this flag extension classes  fall into the 
'builtin'  part.

A way to solve the problem without backward compatibility problems, 
would be to have a new TPFLAGS_SETABLE flag, defaulting to 0 for 
builtins/extension classes and 1 for User,Python classes. This flag 
would be check in place of the heaptype one when revelant.

I'm ready to write the code for this if there is some positive votes, 
won't bother if everybody is against it.

Boris


From barry at python.org  Fri Nov  7 09:22:32 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 09:22:45 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <000901c3a501$8fb10800$1535c797@oemcomputer>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
Message-ID: <1068214951.15995.100.camel@anthem>

On Fri, 2003-11-07 at 02:33, Raymond Hettinger wrote:
> Idea for the day:  Let the str.encode/decode methods accept keyword
> arguments to be forwarded to the underlying codec.

Nice.

> Already, the existing protocol has provided consistent, uniform access
> to a variety of services:
> 
>     text.encode('quotedprintable')
>     text.encode('rot13')
>     text.encode('palmos')
> 
> The proposed extension allows this benefit to apply to an even broader
> range of services.

Which is all really cool.  The only thing that begins to bother me about
this is the use of strings as name lookup keys for finding functions. 
This seems generally unpythonic and error prone -- aside from the
documentation problem that the list of standard lookup keys is buried in
a non-obvious place.

-Barry


From Jack.Jansen at cwi.nl  Fri Nov  7 09:37:41 2003
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Fri Nov  7 09:37:36 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <20031103140123.GA14146@panix.com>
References: <200311031347.10995.aleaxit@yahoo.com>
	<20031103140123.GA14146@panix.com>
Message-ID: <F14391BA-112F-11D8-BFFE-0030655234CE@cwi.nl>


On 3 Nov 2003, at 15:01, Aahz wrote:

> On Mon, Nov 03, 2003, Alex Martelli wrote:
>>
>> I made a few bugfix check-ins to the 2.3 maintenance branch this
>> weekend and Michael Hudson commented that he thinks that so doing is a
>> bad idea, that bug fixes should filter from the 2.4 trunk to the 2.3
>> branch and not the other way around.  Is this indeed the policy (have
>> I missed some guidelines about it)?
>
> PEP 6:
>
>     As individual patches get contributed to the feature release fork,
>     each patch contributor is requested to consider whether the patch 
> is
>     a bug fix suitable for inclusion in a patch release.  If the patch 
> is
>     considered suitable, the patch contributor will mail the 
> SourceForge
>     patch (bug fix?) number to the maintainers' mailing list.

Is it okay to apply fixes to the branch only when I know the relevant 
portions of the trunk will disappear before 2.4?

I've done some fixes to the MacPython IDE that I did only on the 
release23-maint branch, because the plan is that the IDE will be 
replaced by something completely different soon...
--
Jack Jansen        <Jack.Jansen@cwi.nl>        http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman


From aahz at pythoncraft.com  Fri Nov  7 09:59:34 2003
From: aahz at pythoncraft.com (Aahz)
Date: Fri Nov  7 09:59:37 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch
In-Reply-To: <F14391BA-112F-11D8-BFFE-0030655234CE@cwi.nl>
References: <200311031347.10995.aleaxit@yahoo.com>
	<20031103140123.GA14146@panix.com>
	<F14391BA-112F-11D8-BFFE-0030655234CE@cwi.nl>
Message-ID: <20031107145934.GA10075@panix.com>

On Fri, Nov 07, 2003, Jack Jansen wrote:
>
> Is it okay to apply fixes to the branch only when I know the relevant 
> portions of the trunk will disappear before 2.4?

I'd say not.  That's the same reasoning Alex used, and I think that any
exceptions made will only lead to trouble later.  What happens if you
get hit by a beer truck and the 2.4 changes don't get made?
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From aahz at pythoncraft.com  Fri Nov  7 10:06:19 2003
From: aahz at pythoncraft.com (Aahz)
Date: Fri Nov  7 10:06:24 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <000901c3a501$8fb10800$1535c797@oemcomputer>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
Message-ID: <20031107150619.GB10075@panix.com>

On Fri, Nov 07, 2003, Raymond Hettinger wrote:
> 
> For example, zlib_codec.py can then express its encoding function as:
> 
> def zlib_encode(input,errors='strict', **kwds):
>     assert errors == 'strict'
>     if 'level' in kwds:
>         output = zlib.compress(input, kwds['level'])
>     else:
>         output = zlib.compress(input)
>     return (output, len(input))
> 
> The user can then have access to zlib's optional compression level
> argument:
> 
> >>> 'which witch has which witches wristwatch'.encode('zlib', level=9)

Change this to

    def zlib_encode(input,errors='strict', opts=None):
        if opts:
            if 'level' in opts:
                ...

>>> 'which witch has which witches wristwatch'.encode('zlib', {'level':9})

and I'm +1.  Otherwise I'm somewhere around -0; I agree with Barry about
possible pollution.  This change is a small inconvenience for greater
decoupling.  opts could be an instance instead, but I think a straight
dict probably makes the most sense.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From barry at python.org  Fri Nov  7 10:14:54 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 10:15:06 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <20031107150619.GB10075@panix.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<20031107150619.GB10075@panix.com>
Message-ID: <1068218094.15995.125.camel@anthem>

On Fri, 2003-11-07 at 10:06, Aahz wrote:

> Change this to
> 
>     def zlib_encode(input,errors='strict', opts=None):
>         if opts:
>             if 'level' in opts:
>                 ...
> 
> >>> 'which witch has which witches wristwatch'.encode('zlib', {'level':9})

Actually, I like that less.  It looks gross to me.  Keyword arguments
are a bit nicer, but do open the possibility for interference with
future arguments to .encode() and .decode().  I'm probably +0 with the
original and -0 with this style.

> and I'm +1.  Otherwise I'm somewhere around -0; I agree with Barry about
> possible pollution.  This change is a small inconvenience for greater
> decoupling.  opts could be an instance instead, but I think a straight
> dict probably makes the most sense.

Actually what I was complaining about probably is too late to "fix".  It
was the use of a string for the first argument to .encode() and
.decode().  I dislike that for the same reason we don't do
obj.__dict__['attribute'] on a regular basis. ;)

-Barry


From aleaxit at yahoo.com  Fri Nov  7 10:24:23 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 10:24:34 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <1068218094.15995.125.camel@anthem>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<20031107150619.GB10075@panix.com>
	<1068218094.15995.125.camel@anthem>
Message-ID: <200311071624.23409.aleaxit@yahoo.com>

On Friday 07 November 2003 04:14 pm, Barry Warsaw wrote:
   ...
> Actually what I was complaining about probably is too late to "fix".  It

We must keep supporting that approach, yes (alas), but maybe it's
not too late to encourage another alternative style instead?  E.g., have
some object exposing attributes corresponding to those strings that
do name codecs, so that while e.g.

    s.encode('zlib', level=9)

would have to keep working, the officially encouraged style would be:

    s.encode(codec.zlib, level=9)

or something of that ilk...?

> was the use of a string for the first argument to .encode() and
> .decode().  I dislike that for the same reason we don't do
> obj.__dict__['attribute'] on a regular basis. ;)

So my suggestion would take us back to obj.attribute style (as a
preferred alternative to using 'attribute' overtly as a dict key)...


Alex


From barry at python.org  Fri Nov  7 10:31:29 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 10:31:35 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <200311071624.23409.aleaxit@yahoo.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<20031107150619.GB10075@panix.com> <1068218094.15995.125.camel@anthem>
	<200311071624.23409.aleaxit@yahoo.com>
Message-ID: <1068219089.15995.128.camel@anthem>

On Fri, 2003-11-07 at 10:24, Alex Martelli wrote:

> We must keep supporting that approach, yes (alas), but maybe it's
> not too late to encourage another alternative style instead?  E.g., have
> some object exposing attributes corresponding to those strings that
> do name codecs, so that while e.g.
> 
>     s.encode('zlib', level=9)
> 
> would have to keep working, the officially encouraged style would be:
> 
>     s.encode(codec.zlib, level=9)
> 
> or something of that ilk...?

If s.encode(codec.notacodec, level=9) throws an AttributeError, then
+1.  Add that to the original idea and +1 all around.

-Barry


From python at rcn.com  Fri Nov  7 10:36:19 2003
From: python at rcn.com (Raymond Hettinger)
Date: Fri Nov  7 10:36:34 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <200311071624.23409.aleaxit@yahoo.com>
Message-ID: <000d01c3a544$e4081540$bfb42c81@oemcomputer>

[Barry]
> > Actually what I was complaining about probably is too late to "fix".
It

[Alex]
> We must keep supporting that approach, yes (alas), but maybe it's
> not too late to encourage another alternative style instead?  E.g.,
have
> some object exposing attributes corresponding to those strings that
> do name codecs, so that while e.g.
> 
>     s.encode('zlib', level=9)
> 
> would have to keep working, the officially encouraged style would be:
> 
>     s.encode(codec.zlib, level=9)
> 
> or something of that ilk...?

+1, that is a great idea.


Raymond


From aleaxit at yahoo.com  Fri Nov  7 10:49:27 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 10:49:36 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <1068219089.15995.128.camel@anthem>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<200311071624.23409.aleaxit@yahoo.com>
	<1068219089.15995.128.camel@anthem>
Message-ID: <200311071649.27884.aleaxit@yahoo.com>

On Friday 07 November 2003 04:31 pm, Barry Warsaw wrote:
> On Fri, 2003-11-07 at 10:24, Alex Martelli wrote:
> > We must keep supporting that approach, yes (alas), but maybe it's
> > not too late to encourage another alternative style instead?  E.g., have
> > some object exposing attributes corresponding to those strings that
> > do name codecs, so that while e.g.
> >
> >     s.encode('zlib', level=9)
> >
> > would have to keep working, the officially encouraged style would be:
> >
> >     s.encode(codec.zlib, level=9)
> >
> > or something of that ilk...?
>
> If s.encode(codec.notacodec, level=9) throws an AttributeError, then
> +1.  Add that to the original idea and +1 all around.

We should surely be able to arrange an object (codecs.codec ...? not
sure where it should best live) that exposes as attributes those codecs
that are registered, and raises AttributeError for attempts to access on
it attributes with other names, it seems to me.  Q&D worst case,

class _Codec_Lookupper(object):
    def __getattr__(self, name):
        try: codecs.lookup(name)
        except LookupError: raise AttributeError
        else: return name
codecs.codec = _Codec_Lookupper()

[which is something we could try out right now...]

(but I suspect that we can do better, performance-wise, by returning
the lookup's result as a non-string in case of success, saving .encode
and .decode some duplicated work).


Alex


From anthony at interlink.com.au  Fri Nov  7 11:01:51 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Nov  7 11:02:29 2003
Subject: [Python-Dev] check-in policy, trunk vs maintenance branch 
In-Reply-To: <F14391BA-112F-11D8-BFFE-0030655234CE@cwi.nl> 
Message-ID: <200311071601.hA7G1qD2030938@localhost.localdomain>


>>> Jack Jansen wrote
> Is it okay to apply fixes to the branch only when I know the relevant 
> portions of the trunk will disappear before 2.4?
> 
> I've done some fixes to the MacPython IDE that I did only on the 
> release23-maint branch, because the plan is that the IDE will be 
> replaced by something completely different soon...

I'd prefer to see them applied to the trunk as well, unless it's a 
significant amount of work to do so.

Plans (and workloads) change, and big replacement/rewrites sometimes
don't happen. Going through changelogs (much) after the fact to try
and find missed trunk->branch or branch->trunk patches is a nightmare.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From aleaxit at yahoo.com  Fri Nov  7 11:08:02 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 11:08:08 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071649.27884.aleaxit@yahoo.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
Message-ID: <200311071708.02744.aleaxit@yahoo.com>

From Barry's discussion of the problem of "magic strings" as arguments to 
.encode / .decode , I was reminded of a blog entry,

http://www.brunningonline.net/simon/blog/archives/000803.html

which mentions another case of "magic strings" that might perhaps be
(optionally but suggestedly) changed into more-readable attributes (in
this case, clearly attributes of the 'file' type): mode arguments to 'file'
calls.  Simon Brunning, the author of that blog entry, argues that

myFile = file(filename, 'rb')

(while of course we're going to keep accepting it forever) is not quite as 
readable and maintainable as, e.g.:

myFile = file(filename, file.READ + file.BINARY)

Just curious -- what are everybody's feelings about that idea?  I'm
about +0 on it, myself -- I doubt I'd remember to use it (too much C
in my past...:-) but I see why others would prefer it.


Another separate "attributes of types" issue raised by that same blog
entry -- and that one does find me +1 -- is: isn't it time to make available
as attributes of the str type object those few things that we still need
to 'import string' for?  E.g., the maketrans function (and maybe we could
even give it a better name as long as we're making it a str.something?)...


Alex


From tim.hochberg at ieee.org  Fri Nov  7 11:11:13 2003
From: tim.hochberg at ieee.org (Tim Hochberg)
Date: Fri Nov  7 11:11:20 2003
Subject: [Python-Dev] Re: Optional arguments for str.encode /.decode
In-Reply-To: <200311071624.23409.aleaxit@yahoo.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>	<20031107150619.GB10075@panix.com>	<1068218094.15995.125.camel@anthem>
	<200311071624.23409.aleaxit@yahoo.com>
Message-ID: <3FABC421.7020505@ieee.org>

Alex Martelli wrote:
> On Friday 07 November 2003 04:14 pm, Barry Warsaw wrote:
>    ...
> 
>>Actually what I was complaining about probably is too late to "fix".  It
> 
> 
> We must keep supporting that approach, yes (alas), but maybe it's
> not too late to encourage another alternative style instead?  E.g., have
> some object exposing attributes corresponding to those strings that
> do name codecs, so that while e.g.
> 
>     s.encode('zlib', level=9)
> 
> would have to keep working, the officially encouraged style would be:
> 
>     s.encode(codec.zlib, level=9)
> 
> or something of that ilk...?

<unlurk>

FWIW,

If keyword arg collisions are still a concern, it seem it should be 
possible to make the following work without too much trouble::

  s.encode(codec.zlib(level=9))

These codec objects could be simple classes that stash away their args 
and kwargs to pass on to the underlying encode::

   class CodecObj:
      def __init__(self, *args, **kwargs):
         self.name = self.__class__.__name___
         self.args = args
         self.kargs = kargs
   class zlib(CodecObj):
      pass
   # ....

In the encode method, the codec name, args and kargs would be grabbed 
from the corresponding attributes of the CodecObj (Unless the object was 
a string, in which case the old behaviour would be used).

This would have the added advantage of pushing people to the new syntax.

The downside is that::

     s.encode(codec.zlib)

wouldn't work. One would probably have to use the more verbose syntax::

     s.encode(codec.zlib())

-tim

</unlurk>


>>was the use of a string for the first argument to .encode() and
>>.decode().  I dislike that for the same reason we don't do
>>obj.__dict__['attribute'] on a regular basis. ;)
> 
> 
> So my suggestion would take us back to obj.attribute style (as a
> preferred alternative to using 'attribute' overtly as a dict key)...
> 
> 
> Alex
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org
> 


From guido at python.org  Fri Nov  7 12:05:12 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 12:05:21 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 17:08:02 +0100."
	<200311071708.02744.aleaxit@yahoo.com> 
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com> 
	<200311071708.02744.aleaxit@yahoo.com> 
Message-ID: <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>

> http://www.brunningonline.net/simon/blog/archives/000803.html
> 
> which mentions another case of "magic strings" that might perhaps be
> (optionally but suggestedly) changed into more-readable attributes (in
> this case, clearly attributes of the 'file' type): mode arguments to 'file'
> calls.  Simon Brunning, the author of that blog entry, argues that
> 
> myFile = file(filename, 'rb')
> 
> (while of course we're going to keep accepting it forever) is not quite as 
> readable and maintainable as, e.g.:
> 
> myFile = file(filename, file.READ + file.BINARY)
> 
> Just curious -- what are everybody's feelings about that idea?  I'm
> about +0 on it, myself -- I doubt I'd remember to use it (too much C
> in my past...:-) but I see why others would prefer it.

Doesn't seem the right solution to me.  If I were to design an API
for this without reference to the C convention, I'd probably use
keyword arguments.

I outright disagree with Brunning's idea for the struct module.  More
verbose isn't always more readable or easier to remember.

> Another separate "attributes of types" issue raised by that same
> blog entry -- and that one does find me +1 -- is: isn't it time to
> make available as attributes of the str type object those few things
> that we still need to 'import string' for?  E.g., the maketrans
> function (and maybe we could even give it a better name as long as
> we're making it a str.something?)...

Yes, that would be good.  Is there anything besides maketrans() in the
string module worth saving?  (IMO letters and digits etc. are not --
you can use s.isletter() etc. for that.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Nov  7 12:16:57 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 12:17:16 2003
Subject: [Python-Dev] Code to prevent modification on builtins classes
	also abusively (IMHO) prevents modifications on extensions
	modules, some ideas on this.
In-Reply-To: Your message of "Fri, 07 Nov 2003 13:24:35 +0100."
	<3FAB8F03.50601@arteris.net> 
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com>
	<3FAB8F03.50601@arteris.net> 
Message-ID: <200311071716.hA7HGv502563@12-236-54-216.client.attbi.com>

> I look into the archives and didn't see any debate on the question, hope 
> I didn't miss something.
> 
> My point concerns limitations on extensions module due to checks aiming 
> the builtins.
> The main point is settable extension classes.
> In Python code there is some checks against TPFLAGS_HEAPTYPE, extension 
> modules should'nt have this flag, so the normal type->tp_setattro doesnt 
> allow the user to
> set new attributes on your extension classes. There is a way around, 
> write a special MetaClass which redefine setattr.

Or you can create a Python subclass that doesn't add any features but
inherits from your extension class -- the user can set attributes on
the Python class to their heart's content and everything will work as
needed.

> In the extension module I'm writing (I'm porting some Python code to 
> Python-C for speed issues) the user can set attributes and slots on my 
> classes.
> What I need is the complete type->tp_setattro behaviour, without the 
> check. I didn't see a way to have this behaviour using only Python API 
> (is rereadying the type a work around ?), so I copy paste all the code 
> to make update_slots work (ouch 2500 lines).

A much simpler approach would be to have a metaclass whose tp_setattro
clears the HEAPTYPE flag, calls type->tp_setattro, and then restores
the HEAPTYPE flag.  Yes, that might be considered cheating, but so is
copying 2500 lines of code. :-)

> This is now almost working, every kind of attribute can be set but the 
> __setattr__ one, the hackcheck prevents the user from calling another 
> __setattr__ from its new setattr:
> example of my extension class hierachy:
> Class A(object)
> Class B(A)
> 
> In the extension, there is a tp->setattro on B, if the user want to 
> redefine it, he can't call the A __setattr__:
> def myBSetattr(self,k,v):
>     super(B,self).__setattr__(k,v)
>     ## Do here my special stuff
> This won't work, the hachcheck will see some kind of hack here, 'you 
> cant' call the A.__setattr__ function from a B object' :).

I don't understand this -- does any of my suggestions above handle it?

> First question, Is there a known way around ?
> 
> Possible Improvments :
> 
> In the python code there is in function function checks to see if you 
> are not modying builtins classes, unfortunately this code is also 
> concerning extension modules.
> I think the Heaptype flag is abusively used in differents cases mostly, 
> in type_setattro, object_set_bases, object_set_classes, the checks have 
> nothing to do with the HeapType true definition as stated in the 
> comments in Include/Object.h , it is used, I think, only because this is 
> the only one that makes a difference between builtins and user classes. 
> Unfortunately with  this flag extension classes  fall into the 
> 'builtin'  part.
> 
> A way to solve the problem without backward compatibility problems, 
> would be to have a new TPFLAGS_SETABLE flag, defaulting to 0 for 
> builtins/extension classes and 1 for User,Python classes. This flag 
> would be check in place of the heaptype one when revelant.
> 
> I'm ready to write the code for this if there is some positive votes, 
> won't bother if everybody is against it.

This seems to be a reasonable suggestion, however I want you to
consider what happens if you are using multiple interpreters.  When
you set a function attribute builtin or extension type, the function
references to the environment of the interpreter where it was defined,
but it is visible from all interpreters.  This is likely not what you
want, and that's why the HEAPTYPE flag exists.  I would strongly
advise using my first suggestion above (derive a class in Python)
rather than mess with HEAPTYPE.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Fri Nov  7 12:17:05 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 12:17:20 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071708.02744.aleaxit@yahoo.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
Message-ID: <1068225424.15995.146.camel@anthem>

On Fri, 2003-11-07 at 11:08, Alex Martelli wrote:
> From Barry's discussion of the problem of "magic strings" as arguments to 
> .encode / .decode , I was reminded of a blog entry,
> 
> http://www.brunningonline.net/simon/blog/archives/000803.html
> 
> which mentions another case of "magic strings" that might perhaps be
> (optionally but suggestedly) changed into more-readable attributes (in
> this case, clearly attributes of the 'file' type): mode arguments to 'file'
> calls.  Simon Brunning, the author of that blog entry, argues that
> 
> myFile = file(filename, 'rb')
> 
> (while of course we're going to keep accepting it forever) is not quite as 
> readable and maintainable as, e.g.:
> 
> myFile = file(filename, file.READ + file.BINARY)
> 
> Just curious -- what are everybody's feelings about that idea?  I'm
> about +0 on it, myself -- I doubt I'd remember to use it (too much C
> in my past...:-) but I see why others would prefer it.

I'm with you: too much muscle memory to probably use it.  But I still
think it's a good idea, with one caveat.  A problem with constants like
this, especially if they're mapped to integers, is that printing them is
unhelpful:

>>> from socket import *
>>> print AF_UNIX
1
>>> from errno import *
>>> print EEXIST
17

If your memory is as bad as mine, how many times have /you/ typed
errno.errorcode[17]?  :)

I would love it if what happened really was something like:

>>> from socket import *
>>> print AF_UNIX
socket.AF_UNIX
>>> from errno import *
>>> print EEXIST
errno.EEXIST

Now, I have an enum metaclass, originally ripped from Jeremy, but with a
few nice additions and modifications of my own, which would get us
closer to this.  It allows you to define an enum like:

>>> class Family(enum.Enum):
...  AF_UNIX = 1
...  AF_INET = 2
...  # ...
... 
>>> Family.AF_UNIX
EnumInstance(Family, AF_UNIX, 1)
>>> Family.AF_UNIX == 1
True
>>> Family.AF_UNIX == 3
False
>>> [x for x in Family]
[EnumInstance(Family, AF_UNIX, 1), EnumInstance(Family, AF_INET, 2)]
>>> Family[1]
EnumInstance(Family, AF_INET, 2)

The last might be a tad surprising, but makes sense if you think about
it. :)  Class Enum has a metaclass of EnumMetaclass, where all the fun
magic is <wink>.  EnumInstances are subclasses of int and it would be
easy to make their __str__() be the nicer output format.

Anyway, if these type attribute constants like file.READ were something
like EnumInstances, then I think it would make writing and debugging
stuff like this much nicer.

> Another separate "attributes of types" issue raised by that same blog
> entry -- and that one does find me +1 -- is: isn't it time to make available
> as attributes of the str type object those few things that we still need
> to 'import string' for?  E.g., the maketrans function (and maybe we could
> even give it a better name as long as we're making it a str.something?)...

+1-ly y'rs,
-Barry


From barry at python.org  Fri Nov  7 12:19:12 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 12:19:24 2003
Subject: [Python-Dev] Re: Optional arguments for str.encode /.decode
In-Reply-To: <3FABC421.7020505@ieee.org>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<20031107150619.GB10075@panix.com>	<1068218094.15995.125.camel@anthem>
	<200311071624.23409.aleaxit@yahoo.com>  <3FABC421.7020505@ieee.org>
Message-ID: <1068225551.15995.149.camel@anthem>

On Fri, 2003-11-07 at 11:11, Tim Hochberg wrote:

> The downside is that::
> 
>      s.encode(codec.zlib)
> 
> wouldn't work. One would probably have to use the more verbose syntax::
> 
>      s.encode(codec.zlib())

Maybe not.  s.encode() can magically zero-arg instantiate the class. 
We're starting to put a lot of smarts into .encode() and .decode() but I
think it's worth it.  Nice idea.

-Barry


From barry at python.org  Fri Nov  7 12:26:43 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 12:26:53 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
Message-ID: <1068226002.15995.153.camel@anthem>

On Fri, 2003-11-07 at 12:05, Guido van Rossum wrote:

> Yes, that would be good.  Is there anything besides maketrans() in the
> string module worth saving?  (IMO letters and digits etc. are not --
> you can use s.isletter() etc. for that.)

I'm not following, are you saying we don't need string.ascii_letters and
friends any more?

-Barry


From aleaxit at yahoo.com  Fri Nov  7 12:30:55 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 12:31:03 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <1068226002.15995.153.camel@anthem>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
Message-ID: <200311071830.55764.aleaxit@yahoo.com>

On Friday 07 November 2003 06:26 pm, Barry Warsaw wrote:
> On Fri, 2003-11-07 at 12:05, Guido van Rossum wrote:
> > Yes, that would be good.  Is there anything besides maketrans() in the
> > string module worth saving?  (IMO letters and digits etc. are not --
> > you can use s.isletter() etc. for that.)
>
> I'm not following, are you saying we don't need string.ascii_letters and
> friends any more?

I think we do, but I'd rather access them as str.ascii_letters myself.  Or
maybe we could use just letters, lowercase and uppercase as attribute
names, implying the ascii_ -- people needing nonasciis might then still
need to "import string", which in itself might be OK, but... that might be
a bit too confusing overall.

Anyway, I do have code that e.g. does "for c in string.ascii_lowercase: ...", 
and that is not as handily done with just the .islowercase method...


Alex


From guido at python.org  Fri Nov  7 12:35:26 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 12:35:38 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 12:26:43 EST."
	<1068226002.15995.153.camel@anthem> 
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> 
	<1068226002.15995.153.camel@anthem> 
Message-ID: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>

> > Yes, that would be good.  Is there anything besides maketrans() in the
> > string module worth saving?  (IMO letters and digits etc. are not --
> > you can use s.isletter() etc. for that.)
> 
> I'm not following, are you saying we don't need string.ascii_letters and
> friends any more?

Hm, I'd forgotten about ascii_letters.  It would make a beautiful
class attribute of str.

I *do* think that we don't need string.letters -- the only use for it
I've seen is checking if a character is in that string, and
c.isletter() is faster.  But if someone has a use case for it that
isn't argued away, I'd be okay with seeing it reincarnated as a class
attribute of str too.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Fri Nov  7 12:37:27 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 12:37:32 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
Message-ID: <200311071837.27292.aleaxit@yahoo.com>

On Friday 07 November 2003 06:05 pm, Guido van Rossum wrote:
   ...
> Doesn't seem the right solution to me.  If I were to design an API
> for this without reference to the C convention, I'd probably use
> keyword arguments.

Interesting!  Something like f = file('foo', writable=True) ... ?

> I outright disagree with Brunning's idea for the struct module.  More
> verbose isn't always more readable or easier to remember.

Heh, yes, I didn't even quote that one, being -1 on it myself:-)


> > Another separate "attributes of types" issue raised by that same
> > blog entry -- and that one does find me +1 -- is: isn't it time to
> > make available as attributes of the str type object those few things
> > that we still need to 'import string' for?  E.g., the maketrans
> > function (and maybe we could even give it a better name as long as
> > we're making it a str.something?)...
>
> Yes, that would be good.  Is there anything besides maketrans() in the
> string module worth saving?  (IMO letters and digits etc. are not --
> you can use s.isletter() etc. for that.)

Hmmm, I do have loops such as 'for c in string.ascii_lowercase: ..."; e.g
in a letter-counting example:

for c in string.ascii_lowercase:
    print '%s: %8d" % (c, counts.get(c,0))

using counts.keys(), sorted, wouldn't be the same, as the 0's would not
stand out.  Admittedly coding 'abc...xyz' explicitly ain't gonna kill me, 
but...


Alex


From fdrake at acm.org  Fri Nov  7 12:40:31 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 12:40:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <1068226002.15995.153.camel@anthem>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
Message-ID: <16299.55567.884685.216681@grendel.zope.com>


On Fri, 2003-11-07 at 12:05, Guido van Rossum wrote:
 > Yes, that would be good.  Is there anything besides maketrans() in the
 > string module worth saving?  (IMO letters and digits etc. are not --
 > you can use s.isletter() etc. for that.)

Yikes!  Are you assuming those are only used for "in" tests???

Barry Warsaw writes:
 > I'm not following, are you saying we don't need string.ascii_letters and
 > friends any more?

We definately need these still.  I don't see any reason to remove
them, and they're definately still used.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From skip at pobox.com  Fri Nov  7 13:15:27 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Nov  7 13:15:41 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <1068225424.15995.146.camel@anthem>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<1068225424.15995.146.camel@anthem>
Message-ID: <16299.57663.781598.114168@montanaro.dyndns.org>


    Barry> I would love it if what happened really was something like:

    >>> from socket import *
    >>> print AF_UNIX
    socket.AF_UNIX
    >>> from errno import *
    >>> print EEXIST
    errno.EEXIST

http://manatee.mojam.com/~skip/python/ConstantMap.py

No metaclass wizardry needed.

i-didn't-even-know-i-owned-a-time-machine-ly y'rs,

Skip

From python at rcn.com  Fri Nov  7 13:25:58 2003
From: python at rcn.com (Raymond Hettinger)
Date: Fri Nov  7 13:26:11 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
Message-ID: <002701c3a55c$97088c80$bfb42c81@oemcomputer>

> Hm, I'd forgotten about ascii_letters.  It would make a beautiful
> class attribute of str.

The problem with ascii_letters is that it is not constant.  Depending on
the startup, it can optionally replace the usual definition with that
provided by strop.lowercase.


> I *do* think that we don't need string.letters -- the only use for it
> I've seen is checking if a character is in that string, and
> c.isletter() is faster.  But if someone has a use case for it that
> isn't argued away, I'd be okay with seeing it reincarnated as a class
> attribute of str too.

I had C coded a patch for a whole group of str.isSomething tests.  The
only thing that held it up was my not finding time to figure out how to
exactly the same thing for Unicode objects.  Maybe someone can pick-up
the patch:

   www.python.org/sf/562501


Raymond


From barry at python.org  Fri Nov  7 13:44:23 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 13:44:44 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
	<200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
Message-ID: <1068230662.15995.159.camel@anthem>

On Fri, 2003-11-07 at 12:35, Guido van Rossum wrote:

> Hm, I'd forgotten about ascii_letters.  It would make a beautiful
> class attribute of str.
> 
> I *do* think that we don't need string.letters -- the only use for it
> I've seen is checking if a character is in that string, and
> c.isletter() is faster.

Ah gotcha.  I'd definitely want to retain ascii_letters, probably
ascii_lowercase and ascii_uppercase, digits, hexdigits, octdigits,
punctuation, printable, and whitespace.  I'm not sure about the locale
specific constants, but maybe we do something like:

str.ascii.letters
str.ascii.lowercase
str.locale.letters
str.locale.lowercase

I'd definitely want to make these all read-only, e.g. removing the
undefined warnings for string.lowercase.

-Barry


From barry at python.org  Fri Nov  7 13:50:34 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 13:50:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16299.57663.781598.114168@montanaro.dyndns.org>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<1068225424.15995.146.camel@anthem>
	<16299.57663.781598.114168@montanaro.dyndns.org>
Message-ID: <1068231034.15995.162.camel@anthem>

On Fri, 2003-11-07 at 13:15, Skip Montanaro wrote:
>     Barry> I would love it if what happened really was something like:
> 
>     >>> from socket import *
>     >>> print AF_UNIX
>     socket.AF_UNIX
>     >>> from errno import *
>     >>> print EEXIST
>     errno.EEXIST
> 
> http://manatee.mojam.com/~skip/python/ConstantMap.py
> 
> No metaclass wizardry needed.
> 
> i-didn't-even-know-i-owned-a-time-machine-ly y'rs,

Oh boo.  Metaclasses are so much fun though! :)

But the enum stuff does have some other advantages.  I'll try to clean
the code up (read: document it :) and post it somewhere.

-Barry


From guido at python.org  Fri Nov  7 13:59:31 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 14:00:39 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 13:25:58 EST."
	<002701c3a55c$97088c80$bfb42c81@oemcomputer> 
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> 
Message-ID: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>

> > Hm, I'd forgotten about ascii_letters.  It would make a beautiful
> > class attribute of str.
> 
> The problem with ascii_letters is that it is not constant.  Depending on
> the startup, it can optionally replace the usual definition with that
> provided by strop.lowercase.

Haven't you got that backwards?  I thought ascii_letters was really a
constant, but letters was modified by setlocale().

> > I *do* think that we don't need string.letters -- the only use for it
> > I've seen is checking if a character is in that string, and
> > c.isletter() is faster.  But if someone has a use case for it that
> > isn't argued away, I'd be okay with seeing it reincarnated as a class
> > attribute of str too.
> 
> I had C coded a patch for a whole group of str.isSomething tests.  The
> only thing that held it up was my not finding time to figure out how to
> exactly the same thing for Unicode objects.  Maybe someone can pick-up
> the patch:
> 
>    www.python.org/sf/562501

I don't have time to investigate the patch; is the existing set of
isXXX() methods not enough?  This seems a separate issue though.

Anyway, I've been nearly convinced that the various constants should
be part of the str class.  But should corresponding constants be added
to the Unicode class???  Some would be very large.  If not, I'm less
convinced that they belong on the str class.

Also, perhaps the locale-dependent variables should perhaps be moved
into the locale module?  That would avoid the Unicode question above,
because the locale module doesn't apply to Unicode.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Fri Nov  7 14:04:51 2003
From: python at rcn.com (Raymond Hettinger)
Date: Fri Nov  7 14:05:04 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <1068230662.15995.159.camel@anthem>
Message-ID: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>

> Ah gotcha.  I'd definitely want to retain ascii_letters, probably
> ascii_lowercase and ascii_uppercase, digits, hexdigits, octdigits,
> punctuation, printable, and whitespace

Other than possibly upper and lower, the rest should be skipped and left
for tests like isdigit().  The tests are faster than the usual linear
search style of:   if char in str.letters.


Raymond


From fdrake at acm.org  Fri Nov  7 14:05:14 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 14:05:35 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
Message-ID: <16299.60650.800354.930018@grendel.zope.com>


Guido van Rossum writes:
 > Anyway, I've been nearly convinced that the various constants should
 > be part of the str class.  But should corresponding constants be added
 > to the Unicode class???  Some would be very large.  If not, I'm less
 > convinced that they belong on the str class.

I'm happy for them to stay where they are.

 > Also, perhaps the locale-dependent variables should perhaps be moved
 > into the locale module?  That would avoid the Unicode question above,
 > because the locale module doesn't apply to Unicode.

+1


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From walter at livinglogic.de  Fri Nov  7 14:10:18 2003
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri Nov  7 14:10:24 2003
Subject: [Python-Dev] Re: Optional arguments for str.encode /.decode
In-Reply-To: <1068225551.15995.149.camel@anthem>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>	<20031107150619.GB10075@panix.com>	<1068218094.15995.125.camel@anthem>	<200311071624.23409.aleaxit@yahoo.com>
	<3FABC421.7020505@ieee.org> <1068225551.15995.149.camel@anthem>
Message-ID: <3FABEE1A.1050000@livinglogic.de>

Barry Warsaw wrote:

> On Fri, 2003-11-07 at 11:11, Tim Hochberg wrote:
> 
> 
>>The downside is that::
>>
>>     s.encode(codec.zlib)
>>
>>wouldn't work. One would probably have to use the more verbose syntax::
>>
>>     s.encode(codec.zlib())
> 
> 
> Maybe not.  s.encode() can magically zero-arg instantiate the class. 
> We're starting to put a lot of smarts into .encode() and .decode() but I
> think it's worth it.  Nice idea.

Would this mean any changes to the C API? And if we're going to enhance
the C API, so that

PyObject *PyUnicode_Encode(
    const Py_UNICODE *s,
    int size,
    const char *encoding,
    const char *errors
);

becomes

PyObject *PyUnicode_Encode(
    const Py_UNICODE *s,
    int size,
    PyObject *encoding,
    const char *errors
);

would it make sense to enhance the PEP 293 error callback machinery
to allow

PyObject *PyUnicode_Encode(
    const Py_UNICODE *s,
    int size,
    PyObject *encoding,
    PyObject *errors
);

so that the callback function can be passed directly to the codec
without any need for registering/lookup?

Bye,
    Walter D?rwald


From martin at v.loewis.de  Fri Nov  7 14:12:35 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Fri Nov  7 14:13:52 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <000901c3a501$8fb10800$1535c797@oemcomputer>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
Message-ID: <m3llqs6li4.fsf@mira.informatik.hu-berlin.de>

"Raymond Hettinger" <raymond.hettinger@verizon.net> writes:

> Idea for the day:  Let the str.encode/decode methods accept keyword
> arguments to be forwarded to the underlying codec.

-1. The non-Unicode usage of .encode should not have been there in the
first place, IMO, so I dislike any extensions to it.

Regards,
Martin


From martin at v.loewis.de  Fri Nov  7 14:15:30 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Fri Nov  7 14:16:34 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
Message-ID: <m3he1g6ld9.fsf@mira.informatik.hu-berlin.de>

"Raymond Hettinger" <python@rcn.com> writes:

> > Hm, I'd forgotten about ascii_letters.  It would make a beautiful
> > class attribute of str.
> 
> The problem with ascii_letters is that it is not constant.  Depending on
> the startup, it can optionally replace the usual definition with that
> provided by strop.lowercase.

Can you give an example?

Regards,
Martin

From skip at pobox.com  Fri Nov  7 14:47:27 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Nov  7 14:47:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>
References: <1068230662.15995.159.camel@anthem>
	<002f01c3a562$06131dc0$bfb42c81@oemcomputer>
Message-ID: <16299.63183.923295.432422@montanaro.dyndns.org>

    Raymond> Other than possibly upper and lower, the rest should be skipped
    Raymond> and left for tests like isdigit().  The tests are faster than
    Raymond> the usual linear search style of: if char in str.letters.

A couple people have claimed that the .is*() string methods are faster than
testing a character against a string.  I'm sure that's true in some cases,
but it seems not to be true for string.ascii_letters.  Here are several
timeit.py runs, ordered from slowest to fastest.  Both situations have a
pair of runs, one with a positive test and one with a negative test.

    Using char in someset:
    % timeit.py -s 'import string, sets; pset = sets.Set(string.ascii_letters)' "'.' in pset"
    100000 loops, best of 3: 4.68 usec per loop
    % timeit.py -s 'import string, sets; pset = sets.Set(string.ascii_letters)' "'z' in pset"
    100000 loops, best of 3: 4.58 usec per loop

    Using char.isalpha() or char.islower():
    % timeit.py -s 'import string' "'z'.islower()"
    1000000 loops, best of 3: 0.93 usec per loop
    % timeit.py -s 'import string' "'.'.islower()"
    1000000 loops, best of 3: 0.928 usec per loop
    % timeit.py -s 'import string' "'z'.isalpha()"
    1000000 loops, best of 3: 0.893 usec per loop
    % timeit.py -s 'import string' "'.'.isalpha()"
    1000000 loops, best of 3: 0.96 usec per loop

    Using char in somestring:
    % timeit.py -s 'import string; pset = string.ascii_letters' "'z' in pset"
    1000000 loops, best of 3: 0.617 usec per loop
    % timeit.py -s 'import string; pset = string.ascii_letters' "'.' in pset"
    1000000 loops, best of 3: 0.747 usec per loop

    Using char in somedict:
    % timeit.py -s 'import string; pset = dict(zip(string.ascii_letters,string.ascii_letters))' "'.' in pset"
    1000000 loops, best of 3: 0.502 usec per loop
    % timeit.py -s 'import string; pset = dict(zip(string.ascii_letters,string.ascii_letters))' "'z' in pset"
    1000000 loops, best of 3: 0.509 usec per loop

The only clear loser is the 'char in set' case, no doubt due to its current
Python implementation, however testing a character for membership in a short
string seems to be faster than using the .is*() methods to me.

Skip

From theller at python.net  Fri Nov  7 15:16:42 2003
From: theller at python.net (Thomas Heller)
Date: Fri Nov  7 15:17:04 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> (Guido
	van Rossum's message of "Fri, 07 Nov 2003 09:35:26 -0800")
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
	<200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
Message-ID: <n0b83ped.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> I *do* think that we don't need string.letters -- the only use for it
> I've seen is checking if a character is in that string, and
> c.isletter() is faster.  But if someone has a use case for it that
> isn't argued away, I'd be okay with seeing it reincarnated as a class
> attribute of str too.

But there are probably more useful combinations like

    string.letters + string.digits + "_"

than there should be isxxx() tests.

Thomas


From bac at OCF.Berkeley.EDU  Fri Nov  7 15:49:06 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Fri Nov  7 15:49:17 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
Message-ID: <3FAC0542.30803@ocf.berkeley.edu>

Guido van Rossum wrote:
<SNIP>
> Anyway, I've been nearly convinced that the various constants should
> be part of the str class.  But should corresponding constants be added
> to the Unicode class???  Some would be very large.  If not, I'm less
> convinced that they belong on the str class.
> 
> Also, perhaps the locale-dependent variables should perhaps be moved
> into the locale module?  That would avoid the Unicode question above,
> because the locale module doesn't apply to Unicode.
> 

How about a strtools module?  I was thinking that constants like 
ascii_letters could go there along with an implementation of join() that 
took arguments in an obvious way (or at least the way everyone seems to 
request it).  Barry's string replacement function could also go there 
(the one using $; wasn't it agreed that interpolation was the wrong term 
to use or something?).

This would prevent polluting the str type too much plus remove any 
hindrance that there necessarily be a mirror value for Unicode since the 
docs can explicitly state it only works for str in those cases.

-Brett


From aleaxit at yahoo.com  Fri Nov  7 15:58:36 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 15:58:48 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16299.63183.923295.432422@montanaro.dyndns.org>
References: <1068230662.15995.159.camel@anthem>
	<002f01c3a562$06131dc0$bfb42c81@oemcomputer>
	<16299.63183.923295.432422@montanaro.dyndns.org>
Message-ID: <200311072158.36054.aleaxit@yahoo.com>

On Friday 07 November 2003 20:47, Skip Montanaro wrote:
   ...
> The only clear loser is the 'char in set' case, no doubt due to its
> current Python implementation, however testing a character for membership
> in a short string seems to be faster than using the .is*() methods to me.

Very interesting!  To me, this suggests fixing this performance bug -- there
is no reason that I can see why the .is* methiods should be _slower_.  Would
a performance bugfix (no implementation change, just a speedup) be OK for
2.3.3, I hope?  That would motivate me to work on it soonest...


Alex


From fdrake at acm.org  Fri Nov  7 16:03:20 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 16:03:33 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072158.36054.aleaxit@yahoo.com>
References: <1068230662.15995.159.camel@anthem>
	<002f01c3a562$06131dc0$bfb42c81@oemcomputer>
	<16299.63183.923295.432422@montanaro.dyndns.org>
	<200311072158.36054.aleaxit@yahoo.com>
Message-ID: <16300.2200.332569.861030@grendel.zope.com>


Alex Martelli writes:
 > Very interesting!  To me, this suggests fixing this performance bug -- there
 > is no reason that I can see why the .is* methiods should be _slower_.  Would
 > a performance bugfix (no implementation change, just a speedup) be OK for
 > 2.3.3, I hope?  That would motivate me to work on it soonest...

People keep hinting that these methods should be faster, but I see no
reason to think they would be.  Think about it: using the method
requires the creation of a bound method object.  No matter how fast
PyMalloc is, that's still a fair bit of work.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From aleaxit at yahoo.com  Fri Nov  7 16:04:15 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 16:04:24 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>
References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>
Message-ID: <200311072204.15841.aleaxit@yahoo.com>

On Friday 07 November 2003 20:04, Raymond Hettinger wrote:
> > Ah gotcha.  I'd definitely want to retain ascii_letters, probably
> > ascii_lowercase and ascii_uppercase, digits, hexdigits, octdigits,
> > punctuation, printable, and whitespace
>
> Other than possibly upper and lower, the rest should be skipped and left
> for tests like isdigit().  The tests are faster than the usual linear
> search style of:   if char in str.letters.

I guess the tests should be faster, yes, but I would still want _iterables_
for ascii_*  and digits.

One issue with allowing "if char in string.letters:" is that these days this
will not raise if the alleged 'char' is more than one character -- it will 
give True for (e.g.) 'ab', False for (e.g.) 'foobar', since it tests 
_substrings_.

So, maybe, str.letters and friends should be iterables which also implement 
a __contains__ method that raises some error with helpful information about 
using .iswhatever() instead -- that's assuming we want people NOT to test 
with "if char in str.letters:".  If we DO want people to test that way, 
then I think str.letters should _still_ have __contains__, but specifically 
one to optimize speed in this case (if supported it should be just as fast 
as the .is... method -- which as Skip reminds us may in turn need 
optimization...).


Alex


From martin at v.loewis.de  Fri Nov  7 16:07:30 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Fri Nov  7 16:07:35 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <n0b83ped.fsf@python.net>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
	<200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
	<n0b83ped.fsf@python.net>
Message-ID: <m3znf76g6l.fsf@mira.informatik.hu-berlin.de>

Thomas Heller <theller@python.net> writes:

> But there are probably more useful combinations like
> 
>     string.letters + string.digits + "_"

I think the typical application of this should use regular expressions
instead.

Regards,
Martin


From fdrake at acm.org  Fri Nov  7 16:08:41 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 16:09:05 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <3FAC0542.30803@ocf.berkeley.edu>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<3FAC0542.30803@ocf.berkeley.edu>
Message-ID: <16300.2521.199201.364251@grendel.zope.com>


Brett C. writes:
 > How about a strtools module?  I was thinking that constants like 
 > ascii_letters could go there along with an implementation of join() that 
 > took arguments in an obvious way (or at least the way everyone seems to 
 > request it).

Not sure I like the increasing array of module name suffixes.  There's
the classic "foolib", then we added "footools" and "fooutils" (think
"mimetools" and "distutils").

Not trying to create an issue here, just generally dismayed.

 > Barry's string replacement function could also go there 
 > (the one using $; wasn't it agreed that interpolation was the wrong term 
 > to use or something?).

We're calling it substitution.  People know what that means, and don't
get it confused with interpolation.

 > This would prevent polluting the str type too much plus remove any 
 > hindrance that there necessarily be a mirror value for Unicode since the 
 > docs can explicitly state it only works for str in those cases.

Or it could just work polymorphically.  ;-)  I don't see any need for
everything to be defined by the classes.  Types.  Oh, whatever those
things are!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From aleaxit at yahoo.com  Fri Nov  7 16:11:35 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 16:11:44 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
Message-ID: <200311072211.35895.aleaxit@yahoo.com>

On Friday 07 November 2003 19:59, Guido van Rossum wrote:
   ...
> Anyway, I've been nearly convinced that the various constants should
> be part of the str class.  But should corresponding constants be added
> to the Unicode class???  Some would be very large.  If not, I'm less
> convinced that they belong on the str class.

I think the str.XXX constants should be iterables with a __contain__
method (the latter either to forbid the 'if char in str.XXX:" test if we
dislike it, or to optimize it if we like it).  The corresponding unicode.XXX
constants could also be iterables -- not necessarily large ones if we
don't want them to be: each of them could just step a counter through
all unicode characters and just return the ones that satisfy some
appropriate .iswhatever test.


> Also, perhaps the locale-dependent variables should perhaps be moved
> into the locale module?  That would avoid the Unicode question above,
> because the locale module doesn't apply to Unicode.

+1 -- I think the more "localized" the effects of module locale are, the
happier we shall all be; the "global side effect" of locale.setlocale having
effects on other modules (string, time, os, and gettext) has always left
me a little bit doubtful (I've used it at times, but wished I could avoid
using it...).


Alex


From martin at v.loewis.de  Fri Nov  7 16:16:32 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Fri Nov  7 16:16:54 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072158.36054.aleaxit@yahoo.com>
References: <1068230662.15995.159.camel@anthem>
	<002f01c3a562$06131dc0$bfb42c81@oemcomputer>
	<16299.63183.923295.432422@montanaro.dyndns.org>
	<200311072158.36054.aleaxit@yahoo.com>
Message-ID: <m3fzgz5173.fsf@mira.informatik.hu-berlin.de>

Alex Martelli <aleaxit@yahoo.com> writes:

> Very interesting!  To me, this suggests fixing this performance bug
> -- there is no reason that I can see why the .is* methiods should be
> _slower_.  Would a performance bugfix (no implementation change,
> just a speedup) be OK for 2.3.3, I hope?

Yes, but I doubt you do much about it. I also fail to see how it is
relevant to ascii_letters. .islower is locale-aware, so it is your C
library which does the bulk of the work.

Regards,
Martin

From skip at pobox.com  Fri Nov  7 16:41:15 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Nov  7 16:41:28 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16300.2521.199201.364251@grendel.zope.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<3FAC0542.30803@ocf.berkeley.edu>
	<16300.2521.199201.364251@grendel.zope.com>
Message-ID: <16300.4475.778642.314514@montanaro.dyndns.org>


    >> How about a strtools module?

    Fred> Not sure I like the increasing array of module name suffixes.
    Fred> There's the classic "foolib", then we added "footools" and
    Fred> "fooutils" (think "mimetools" and "distutils").

Not to mention which, we have a perfectly good module name already: string.

Skip

From fdrake at acm.org  Fri Nov  7 16:44:21 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 16:44:31 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16300.4475.778642.314514@montanaro.dyndns.org>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<3FAC0542.30803@ocf.berkeley.edu>
	<16300.2521.199201.364251@grendel.zope.com>
	<16300.4475.778642.314514@montanaro.dyndns.org>
Message-ID: <16300.4661.135812.868335@grendel.zope.com>


Skip Montanaro writes:
 > Not to mention which, we have a perfectly good module name already: string.

+1 for calling it "string"!  It has the nice advantage of backward
compatibility for those names as well.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From barry at python.org  Fri Nov  7 16:57:31 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov  7 16:57:39 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <3FAC0542.30803@ocf.berkeley.edu>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<3FAC0542.30803@ocf.berkeley.edu>
Message-ID: <1068242250.15995.186.camel@anthem>

On Fri, 2003-11-07 at 15:49, Brett C. wrote:

> How about a strtools module?  

I don't see much point.  If we wanted to keep things in a module, the
string module already exists and seems the most logical place for
stringy things.

> Barry's string replacement function could also go there 
> (the one using $; wasn't it agreed that interpolation was the wrong term 
> to use or something?).

I've taken to calling it string substitutions.

-Barry


From fincher.8 at osu.edu  Fri Nov  7 17:58:03 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Fri Nov  7 16:59:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <m3znf76g6l.fsf@mira.informatik.hu-berlin.de>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<n0b83ped.fsf@python.net>
	<m3znf76g6l.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200311071758.03374.fincher.8@osu.edu>

On Friday 07 November 2003 04:07 pm, Martin v. L?wis wrote:
> Thomas Heller <theller@python.net> writes:
> > But there are probably more useful combinations like
> >
> >     string.letters + string.digits + "_"
>
> I think the typical application of this should use regular expressions
> instead.

A typical application, sure.  But not all applications -- what if the string 
is being built, for instance, to pass as the optional "delete" argument of 
str.translate?

Jeremy

From aahz at pythoncraft.com  Fri Nov  7 17:09:01 2003
From: aahz at pythoncraft.com (Aahz)
Date: Fri Nov  7 17:09:08 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <m3znf76g6l.fsf@mira.informatik.hu-berlin.de>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
	<200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
	<n0b83ped.fsf@python.net>
	<m3znf76g6l.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20031107220901.GA20961@panix.com>

On Fri, Nov 07, 2003, Martin v. L?wis wrote:
> Thomas Heller <theller@python.net> writes:
>> 
>> But there are probably more useful combinations like
>> 
>>     string.letters + string.digits + "_"
> 
> I think the typical application of this should use regular expressions
> instead.

Ick:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions."  Now they have two problems.'  --Jamie Zawinski,
comp.emacs.xemacs, 8/1997
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From aleaxit at yahoo.com  Fri Nov  7 17:25:29 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 17:25:44 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16300.2200.332569.861030@grendel.zope.com>
References: <1068230662.15995.159.camel@anthem>
	<200311072158.36054.aleaxit@yahoo.com>
	<16300.2200.332569.861030@grendel.zope.com>
Message-ID: <200311072325.29330.aleaxit@yahoo.com>

On Friday 07 November 2003 22:03, Fred L. Drake, Jr. wrote:
> Alex Martelli writes:
>  > Very interesting!  To me, this suggests fixing this performance bug --
>  > there is no reason that I can see why the .is* methiods should be
>  > _slower_.  Would a performance bugfix (no implementation change, just
>  > a speedup) be OK for 2.3.3, I hope?  That would motivate me to work on
>  > it soonest...
>
> People keep hinting that these methods should be faster, but I see no
> reason to think they would be.  Think about it: using the method
> requires the creation of a bound method object.  No matter how fast
> PyMalloc is, that's still a fair bit of work.

Good point!  So, a first little trick to accelerate this might be to use 
getsets (unfortunately this gives a marginally Python-level-observable
alteration for e.g. "print 'x'.isdigit.__name__", so perhaps it's only 
suitable for 2.4, not 2.3.3, alas... I dunno...).  I tried a little 
experiment adding a new test .isabit() that says if a string is entirely
made up of '0' and '1':

static PyGetSetDef string_getsets[] = {
	{"isabit", (getter)string_isabit, 0, 0},
	{0}
};
    ...
	string_getsets,				/* tp_getset */

where:

static PyObject * _return_true = 0;
static PyObject * _return_false = 0;
static PyObject * _true_returner(PyObject* ignore_self)
{
	Py_RETURN_TRUE;
}
static PyObject * _false_returner(PyObject* ignore_self)
{
	Py_RETURN_FALSE;
}
static PyMethodDef _str_bool_returners[] = {
	{"_str_return_false", (PyCFunction)_false_returner, METH_NOARGS},
	{"_str_return_true", (PyCFunction)_true_returner, METH_NOARGS},
	{0}
};

static PyObject *
string_isabit(PyStringObject *s)
{

	char* p = PyString_AS_STRING(s);
	int len = PyString_GET_SIZE(s);
	int i;

	for(i=0; i<len; ++i) {
		if(p[i]!='0' && p[i]!='1') {
			if(!_return_false) {
				_return_false = PyCFunction_New(
				    _str_bool_returners+0, 0);
			}
			Py_INCREF(_return_false);
			return _return_false;
		}
	}
	if(!_return_true) {
		_return_true = PyCFunction_New(
		    _str_bool_returners+1, 0);
	}
	Py_INCREF(_return_true);
	return _return_true;
}

i.e., exploit the peculiarity of strings' .is...() methods -- called on 
immutable objects, w/o args, so at construction time they might almost as 
well be replaced by the C-coded equivalent of "lambda: return False" or
"lambda: return True".  Of course, we'd still have to supply str.is... 
unbound methods (the tp_getset isn't looked at for class-level access,
right...?) for compatibility with idioms such as filter(str.isdigit, words).

The performance does get some increase this way, though it does not become 
quite as good as an 'in' test yet -- about, I'd say, in-between...:

[alex@lancelot src]$ ./python ~/bin/timeit.py -c '"0".isdigit()'
1000000 loops, best of 3: 0.52 usec per loop
[alex@lancelot src]$ ./python ~/bin/timeit.py -c '"0".isabit()'
1000000 loops, best of 3: 0.39 usec per loop
[alex@lancelot src]$ ./python ~/bin/timeit.py -c '"0" in "01"'
1000000 loops, best of 3: 0.25 usec per loop

and about the same for failed tests:

[alex@lancelot src]$ ./python ~/bin/timeit.py -c '"z" in "01"'
1000000 loops, best of 3: 0.25 usec per loop
[alex@lancelot src]$ ./python ~/bin/timeit.py -c '"z".isabit()'
1000000 loops, best of 3: 0.39 usec per loop
[alex@lancelot src]$ ./python ~/bin/timeit.py -c '"z".isdigit()'
1000000 loops, best of 3: 0.55 usec per loop

Even though to fix the 'x'.is....__name__ issue we'd have to
keep several PyCFunctions corresponding to _true_returner
and _false_returner w/different names and docs, maybe this
is still worth doing for 2.3.something, not just for 2.4... opinions?


Alex


From guido at python.org  Fri Nov  7 17:41:07 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 17:41:15 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 14:05:14 EST."
	<16299.60650.800354.930018@grendel.zope.com> 
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> 
	<16299.60650.800354.930018@grendel.zope.com> 
Message-ID: <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>

> Guido van Rossum writes:
>  > Anyway, I've been nearly convinced that the various constants should
>  > be part of the str class.  But should corresponding constants be added
>  > to the Unicode class???  Some would be very large.  If not, I'm less
>  > convinced that they belong on the str class.

[Fred]
> I'm happy for them to stay where they are.

???

They're in the strign module, which has got to go.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Nov  7 17:42:33 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 17:42:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 21:16:42 +0100."
	<n0b83ped.fsf@python.net> 
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
	<200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> 
	<n0b83ped.fsf@python.net> 
Message-ID: <200311072242.hA7MgXS03159@12-236-54-216.client.attbi.com>

> But there are probably more useful combinations like
> 
>     string.letters + string.digits + "_"
> 
> than there should be isxxx() tests.

We don't need to invent anything for that.  You can use a regular
expression with \w.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Fri Nov  7 17:44:46 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 17:44:56 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <m3fzgz5173.fsf@mira.informatik.hu-berlin.de>
References: <1068230662.15995.159.camel@anthem>
	<200311072158.36054.aleaxit@yahoo.com>
	<m3fzgz5173.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200311072344.46848.aleaxit@yahoo.com>

On Friday 07 November 2003 22:16, Martin v. L?wis wrote:
> Alex Martelli <aleaxit@yahoo.com> writes:
> > Very interesting!  To me, this suggests fixing this performance bug
> > -- there is no reason that I can see why the .is* methiods should be
> > _slower_.  Would a performance bugfix (no implementation change,
> > just a speedup) be OK for 2.3.3, I hope?
>
> Yes, but I doubt you do much about it. I also fail to see how it is

I dunno -- it seems that (on a toy case where an 'in' test takes 0.25 usec
and an .isdigit takes 0.52 to 0.55) we can shave the time to 0.39, about
in-between, by avoiding the generation of a bound-method.  Now of course
saving 25% or so isn't huge, but maybe it's still worth it...?

> relevant to ascii_letters. .islower is locale-aware, so it is your C
> library which does the bulk of the work.

Ah -- interesting point!  So, for example:

f = xx.islower
print f()
# insert locale change here
print f()

should be able to print two distinct values for appropriate values of
xx and locale changes, right?  Hmmm -- if supporting this usage is crucial
then indeed we can't avoid generating a boundmethod (for .islower and
other locale-aware .is* methods), because the "return a function" approach
is basically evaluating the function at attribute-access time... if locale
changes between the attribute-access time and the moment of the call,
then the result may not be as desired.  Funny, among the deleterious effects
of locale-changing's "subterraneous global effects" I had not considered
this one -- it breaks nice conditions we might otherwise have counted on
thanks to strings' immutability and the parameterless nature of the .is...()
methods.  Oh well, I guess the trick is not worth pursuing just for the sake
of .isdigit and .isspace, then, if "locale change between access and call"
must be supported.  Pity, because despite the C library's amount of work,
the overhead of the bound-method generation is not trivial, as Fred 
mentioned.

So, if the fast idiom is _inevitably_ "if xx in ...:" (thanks in part to the 
fact that we _don't_ have to support locale changes in the middle of
things in this case), then perhaps we should stop touting xx.is...() as
superior, and see about offering the best possible support for the
'in' case -- where my remarks about "accidental successes" of, e.g.,
"if xx in ...digits...:" when xx=="23" but not when xx=="34" stand.  We
can't break "if xx in string.digits:" (maybe somebody's relying on the
test succeding when xx is a sequence of adjacent increasing digits?)
but we can surely choose, if we wish, to define the semantics of
"if xx in str.digits:" in a (IMHO) more helpful-against-errors way....


Alex


From guido at python.org  Fri Nov  7 17:47:37 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 17:47:48 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 12:49:06 PST."
	<3FAC0542.30803@ocf.berkeley.edu> 
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> 
	<3FAC0542.30803@ocf.berkeley.edu> 
Message-ID: <200311072247.hA7Mlb503198@12-236-54-216.client.attbi.com>

> How about a strtools module?  I was thinking that constants like 
> ascii_letters could go there along with an implementation of join() that 
> took arguments in an obvious way (or at least the way everyone seems to 
> request it).  Barry's string replacement function could also go there 
> (the one using $; wasn't it agreed that interpolation was the wrong term 
> to use or something?).
> 
> This would prevent polluting the str type too much plus remove any 
> hindrance that there necessarily be a mirror value for Unicode since the 
> docs can explicitly state it only works for str in those cases.

Do we have an indication that the str type is getting polluted too
much?  Apart from the locale-specific things and maketrans, what else
wouldn't work for Unicode that's currently under consideration?

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Nov  7 17:49:31 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 17:49:38 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 22:04:15 +0100."
	<200311072204.15841.aleaxit@yahoo.com> 
References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>  
	<200311072204.15841.aleaxit@yahoo.com> 
Message-ID: <200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com>

> I guess the tests should be faster, yes, but I would still want _iterables_
> for ascii_*  and digits.

Why?  It's not like you're going to save much space by not creating a
string of 52 bytes.

> One issue with allowing "if char in string.letters:" is that these
> days this will not raise if the alleged 'char' is more than one
> character -- it will give True for (e.g.) 'ab', False for (e.g.) 
> 'foobar', since it tests _substrings_.

Right.

> So, maybe, str.letters and friends should be iterables which also
> implement a __contains__ method that raises some error with helpful
> information about using .iswhatever() instead -- that's assuming we
> want people NOT to test with "if char in str.letters:".  If we DO
> want people to test that way, then I think str.letters should
> _still_ have __contains__, but specifically one to optimize speed in
> this case (if supported it should be just as fast as the
> .is... method -- which as Skip reminds us may in turn need
> optimization...).

Hm.  The iterable idea seems overblown for something as simple as
this.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From fdrake at acm.org  Fri Nov  7 17:58:33 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 17:58:44 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
Message-ID: <16300.9113.720680.750981@grendel.zope.com>


Guido van Rossum writes:
 > They're in the strign module,

Right.

 > which has got to go.

I don't think this has ever been justified.  What's wrong with the
string module for things like ascii_letters?  What has to go is the
collection of functions that were replaced by string methods.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From guido at python.org  Fri Nov  7 18:02:57 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 18:03:04 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 17:58:33 EST."
	<16300.9113.720680.750981@grendel.zope.com> 
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> 
	<16300.9113.720680.750981@grendel.zope.com> 
Message-ID: <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>

> I don't think this has ever been justified.  What's wrong with the
> string module for things like ascii_letters?  What has to go is the
> collection of functions that were replaced by string methods.

In the end it would be a module containing 4 constants and one
function.  I'd rather consolidate all that elsewhere.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From fdrake at acm.org  Fri Nov  7 18:10:51 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov  7 18:11:01 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
Message-ID: <16300.9851.671401.447992@grendel.zope.com>


Guido van Rossum writes:
 > In the end it would be a module containing 4 constants and one
 > function.  I'd rather consolidate all that elsewhere.

Frankly, that doesn't bother me, especially given that they've always
been in the string module.  But I count more than 4 constants that
should be kept:

    ascii_letters
    ascii_lowercase
    ascii_uppercase
    digits
    hexdigits
    octdigits
    whitespace

All of these could reasonably live on both str and unicode if that's
not considered pollution.  But if they live in a module, there's no
reason not to keep string around for that purpose.

(I don't object to making them class attributes; I object to creating
a new module for them.)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From guido at python.org  Fri Nov  7 18:17:17 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 18:17:24 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Fri, 07 Nov 2003 18:10:51 EST."
	<16300.9851.671401.447992@grendel.zope.com> 
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> 
	<16300.9851.671401.447992@grendel.zope.com> 
Message-ID: <200311072317.hA7NHHF03334@12-236-54-216.client.attbi.com>

>  > In the end it would be a module containing 4 constants and one
>  > function.  I'd rather consolidate all that elsewhere.
> 
> Frankly, that doesn't bother me, especially given that they've always
> been in the string module.  But I count more than 4 constants that
> should be kept:
> 
>     ascii_letters
>     ascii_lowercase
>     ascii_uppercase
>     digits
>     hexdigits
>     octdigits
>     whitespace
> 
> All of these could reasonably live on both str and unicode if that's
> not considered pollution.  But if they live in a module, there's no
> reason not to keep string around for that purpose.
> 
> (I don't object to making them class attributes; I object to creating
> a new module for them.)

Ah, we agree about this then.

I do think that keeping the string module around without all the
functions it historically contained would be a mistake, confusing
folks.  This error is pretty clear:

   >>> import string
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  ImportError: No module named string
  >>> 

But this one is much more mystifying:

  >>> import string
  >>> print string.join(["a", "b"], ".")
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  AttributeError: 'module' object has no attribute 'join'
  >>> 

--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Fri Nov  7 18:25:31 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Nov  7 18:26:06 2003
Subject: [Python-Dev] other "magic strings" issues 
In-Reply-To: Your message of "Fri, 07 Nov 2003 08:08:02 PST."
	<200311071708.02744.aleaxit@yahoo.com> 
Message-ID: <03Nov7.152531pst."58611"@synergy1.parc.xerox.com>

> myFile = file(filename, 'rb')
> 
> (while of course we're going to keep accepting it forever) is not quite as=
>  
> readable and maintainable as, e.g.:
> 
> myFile = file(filename, file.READ + file.BINARY)

Actually, the default should be BINARY, however it works.  I think
it's insane that 'r' works on Unix but breaks on Windows when reading
a JPEG file.

Bill

From tim.one at comcast.net  Fri Nov  7 18:26:44 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov  7 18:26:49 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKENMGPAB.tim.one@comcast.net>

[Fred]
>> I don't think this has ever been justified.  What's wrong with the
>> string module for things like ascii_letters?  What has to go is the
>> collection of functions that were replaced by string methods.

[Guido]
> In the end it would be a module containing 4 constants and one
> function.  I'd rather consolidate all that elsewhere.

Cool -- let's make a new stringhelpers module, then <wink>.

From exarkun at intarweb.us  Fri Nov  7 18:50:13 2003
From: exarkun at intarweb.us (Jp Calderone)
Date: Fri Nov  7 18:51:05 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com>
	<1068226002.15995.153.camel@anthem>
	<200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com>
Message-ID: <20031107235013.GA30537@intarweb.us>

On Fri, Nov 07, 2003 at 09:35:26AM -0800, Guido van Rossum wrote:
> > > Yes, that would be good.  Is there anything besides maketrans() in the
> > > string module worth saving?  (IMO letters and digits etc. are not --
> > > you can use s.isletter() etc. for that.)
> > 
> > I'm not following, are you saying we don't need string.ascii_letters and
> > friends any more?
> 
> Hm, I'd forgotten about ascii_letters.  It would make a beautiful
> class attribute of str.
> 
> I *do* think that we don't need string.letters -- the only use for it
> I've seen is checking if a character is in that string, and
> c.isletter() is faster.  But if someone has a use case for it that
> isn't argued away, I'd be okay with seeing it reincarnated as a class
> attribute of str too.
> 


  How about this use case?

    def genPassword(pickFrom=string.letters+string.digits, n=8):
        return ''.join([random.choice(pickFrom) for i in range(n)])

  Jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mail.python.org/pipermail/python-dev/attachments/20031107/19059b4b/attachment-0001.bin
From janssen at parc.com  Fri Nov  7 18:58:17 2003
From: janssen at parc.com (Bill Janssen)
Date: Fri Nov  7 19:00:02 2003
Subject: [Python-Dev] other "magic strings" issues 
In-Reply-To: Your message of "Fri, 07 Nov 2003 14:42:33 PST."
	<200311072242.hA7MgXS03159@12-236-54-216.client.attbi.com> 
Message-ID: <03Nov7.155826pst."58611"@synergy1.parc.xerox.com>

> > But there are probably more useful combinations like
> > 
> >     string.letters + string.digits + "_"
> > 
> > than there should be isxxx() tests.
> 
> We don't need to invent anything for that.  You can use a regular
> expression with \w.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)

That's replacing the "clear" with the "arcane" (or perhaps the "fairly
incomprehensible").  Is that really a good ultimate direction for Python?

Bill

From aleaxit at yahoo.com  Fri Nov  7 19:02:04 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Fri Nov  7 19:02:11 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com>
References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>
	<200311072204.15841.aleaxit@yahoo.com>
	<200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com>
Message-ID: <200311080102.04546.aleaxit@yahoo.com>

On Friday 07 November 2003 23:49, Guido van Rossum wrote:
> > I guess the tests should be faster, yes, but I would still want
> > _iterables_ for ascii_*  and digits.
>
> Why?  It's not like you're going to save much space by not creating a
> string of 52 bytes.

Strings are iterables.  What I'm saying is that I don't necessarily need
them to be strings, if having iterables that aren't strings (perhaps a 
string subclass redefining just __contains__) would help with:

> > One issue with allowing "if char in string.letters:" is that these
> > days this will not raise if the alleged 'char' is more than one
> > character -- it will give True for (e.g.) 'ab', False for (e.g.)
> > 'foobar', since it tests _substrings_.
>
> Right.


> > So, maybe, str.letters and friends should be iterables which also
> > implement a __contains__ method that raises some error with helpful
> > information about using .iswhatever() instead -- that's assuming we
> > want people NOT to test with "if char in str.letters:".  If we DO
> > want people to test that way, then I think str.letters should
> > _still_ have __contains__, but specifically one to optimize speed in
> > this case (if supported it should be just as fast as the
> > .is... method -- which as Skip reminds us may in turn need
> > optimization...).
>
> Hm.  The iterable idea seems overblown for something as simple as
> this.

Is presenting this as "a subtype of str that overrides __contains__
appropriately" more acceptable?


Alex


From guido at python.org  Fri Nov  7 19:10:15 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov  7 19:10:28 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Sat, 08 Nov 2003 01:02:04 +0100."
	<200311080102.04546.aleaxit@yahoo.com> 
References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer>
	<200311072204.15841.aleaxit@yahoo.com>
	<200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com> 
	<200311080102.04546.aleaxit@yahoo.com> 
Message-ID: <200311080010.hA80AFe03432@12-236-54-216.client.attbi.com>

> > > I guess the tests should be faster, yes, but I would still want
> > > _iterables_ for ascii_*  and digits.
> >
> > Why?  It's not like you're going to save much space by not creating a
> > string of 52 bytes.
> 
> Strings are iterables.  What I'm saying is that I don't necessarily need
> them to be strings, if having iterables that aren't strings (perhaps a 
> string subclass redefining just __contains__) would help with:

An example given earlier:

  string.letters + string.digits + "_"

indicates that we want them to be concrete strings.

> > > One issue with allowing "if char in string.letters:" is that these
> > > days this will not raise if the alleged 'char' is more than one
> > > character -- it will give True for (e.g.) 'ab', False for (e.g.)
> > > 'foobar', since it tests _substrings_.
> >
> > Right.
> 
> 
> > > So, maybe, str.letters and friends should be iterables which also
> > > implement a __contains__ method that raises some error with helpful
> > > information about using .iswhatever() instead -- that's assuming we
> > > want people NOT to test with "if char in str.letters:".  If we DO
> > > want people to test that way, then I think str.letters should
> > > _still_ have __contains__, but specifically one to optimize speed in
> > > this case (if supported it should be just as fast as the
> > > .is... method -- which as Skip reminds us may in turn need
> > > optimization...).
> >
> > Hm.  The iterable idea seems overblown for something as simple as
> > this.
> 
> Is presenting this as "a subtype of str that overrides __contains__
> appropriately" more acceptable?

No, I think it's being too clever.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pinard at iro.umontreal.ca  Fri Nov  7 18:11:45 2003
From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard)
Date: Fri Nov  7 19:52:55 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16300.4475.778642.314514@montanaro.dyndns.org>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<3FAC0542.30803@ocf.berkeley.edu>
	<16300.2521.199201.364251@grendel.zope.com>
	<16300.4475.778642.314514@montanaro.dyndns.org>
Message-ID: <20031107231145.GA6625@titan.progiciels-bpi.ca>

[Skip Montanaro]

>     >> How about a strtools module?

>     Fred> Not sure I like the increasing array of module name suffixes.
>     Fred> There's the classic "foolib", then we added "footools" and
>     Fred> "fooutils" (think "mimetools" and "distutils").

> Not to mention which, we have a perfectly good module name already: string.

When the `string' module was more or less aimed at deprecation (at least
in practice), a good while ago, this was good news to me, because this
module was preventing me, as a programmer, to use `string' as a variable
name.  Currently in Python, `string' as a module is not ubiquitously
needed as it once was in 1.5.2 times, and this is good news.  Let it go
and vanish if this is doable, but avoid making `string' any stronger.

I would much prefer that library modules (past and future) should never
be named after likely user variable names.

-- 
Fran?ois Pinard   http://www.iro.umontreal.ca/~pinard

From bac at OCF.Berkeley.EDU  Fri Nov  7 20:29:41 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Fri Nov  7 20:29:48 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072247.hA7Mlb503198@12-236-54-216.client.attbi.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<3FAC0542.30803@ocf.berkeley.edu>
	<200311072247.hA7Mlb503198@12-236-54-216.client.attbi.com>
Message-ID: <3FAC4705.3080900@ocf.berkeley.edu>

Guido van Rossum wrote:

>>How about a strtools module?  I was thinking that constants like 
>>ascii_letters could go there along with an implementation of join() that 
>>took arguments in an obvious way (or at least the way everyone seems to 
>>request it).  Barry's string replacement function could also go there 
>>(the one using $; wasn't it agreed that interpolation was the wrong term 
>>to use or something?).
>>
>>This would prevent polluting the str type too much plus remove any 
>>hindrance that there necessarily be a mirror value for Unicode since the 
>>docs can explicitly state it only works for str in those cases.
> 
> 
> Do we have an indication that the str type is getting polluted too
> much?

As of right now?  Not really, but this might lead down that road 
(probably being overly cautious on this).  I do agree with Fred in that 
I would be just as happy to have them in a module.  Might be a bias I 
have developed about keeping *everything* in a class/type or instance (I 
blame Java  =).

I really don't mind if they get added to the type; moving them to 
another module just seemed like a cleaner solution to me.  I am basically:

+0 for making the constants a class variable (really more like +.5, but 
rounding screws that up)
-1 for leaving the string module (I agree with Francois' argument about 
the name, plus we have said it is going to be deprecated for so long I 
would like to see it through)
+1 for moving them to another module that can have generic 
string-helping functions

-Brett


From martin at v.loewis.de  Sat Nov  8 04:53:19 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov  8 04:53:36 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311072344.46848.aleaxit@yahoo.com>
References: <1068230662.15995.159.camel@anthem>
	<200311072158.36054.aleaxit@yahoo.com>
	<m3fzgz5173.fsf@mira.informatik.hu-berlin.de>
	<200311072344.46848.aleaxit@yahoo.com>
Message-ID: <m31xsjtcds.fsf@mira.informatik.hu-berlin.de>

Alex Martelli <aleaxit@yahoo.com> writes:

> I dunno -- it seems that (on a toy case where an 'in' test takes 0.25 usec
> and an .isdigit takes 0.52 to 0.55) we can shave the time to 0.39, about
> in-between, by avoiding the generation of a bound-method.  Now of course
> saving 25% or so isn't huge, but maybe it's still worth it...?

If you can avoid creating bound methods in the general case, that
would be a good thing. Even avoiding them for for strings only would
be valuable, although I would then ask that you extend your strategy
to lists.

> should be able to print two distinct values for appropriate values of
> xx and locale changes, right?  

Correct.

> Hmmm -- if supporting this usage is crucial
> then indeed we can't avoid generating a boundmethod (for .islower and
> other locale-aware .is* methods), because the "return a function" approach
> is basically evaluating the function at attribute-access time... if locale
> changes between the attribute-access time and the moment of the call,
> then the result may not be as desired.  

It's not crucial, but it would be an incompatible change to change it.

However, this is irrelevant with respect to bound methods. The
locale-awareness is in the code of the function, so if you manage to
invoke that at the point of the call (instead of caching its result),
then it would still be compatible.

Regards,
Martin

From raymond.hettinger at verizon.net  Sat Nov  8 07:22:58 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Sat Nov  8 07:23:13 2003
Subject: [Python-Dev] operator.isMappingType
Message-ID: <001101c3a5f3$0c3f0b00$66b52c81@oemcomputer>

>>> import operator
>>> map(operator.isMappingType, [(), [], '', u'', {}])
[True, True, True, True, True]
 
We did not resolve this when it came up before.  Would there be any
objections to my removing operator.isMappingType()?
 
 
Raymond Hettinger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031108/dfc567ff/attachment.html
From martin at v.loewis.de  Sat Nov  8 08:09:09 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov  8 08:09:16 2003
Subject: [Python-Dev] SourceForge CVS services improved
Message-ID: <200311081309.hA8D99ix005156@mira.informatik.hu-berlin.de>

In case you haven't read this announcement:

<quote>
( 2003-11-04 09:51:53 - Project CVS Service ) Cutover of pserver-based
CVS service and ViewCVS access to repositories to the new CVS
infrastructure has been completed. Synchronization of data from the
primary CVS server to the new CVS infrastructure now occurs every 5
hours (formerly once per day). Performance of pserver-based CVS access
and ViewCVS access has been significantly improved; connection
shedding (formerly used to cap the total number of simultaneous CVS
connections) has been disabled.
</quote>

So anonymous users of the Python CVS should not see rejected
connections anymore, and should see files only "slightly" behind.  SF
has completed the installation of new CVS server hardware, so
developers should see an improved performance, compared to several
months ago.

Regards,
Martin

From python at rcn.com  Sat Nov  8 09:29:28 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov  8 09:29:44 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <m3llqs6li4.fsf@mira.informatik.hu-berlin.de>
Message-ID: <002601c3a604$b7b55460$66b52c81@oemcomputer>

> "Raymond Hettinger" <raymond.hettinger@verizon.net> writes:
> 
> > Idea for the day:  Let the str.encode/decode methods accept keyword
> > arguments to be forwarded to the underlying codec.
> 
> -1. The non-Unicode usage of .encode should not have been there in the
> first place, IMO, so I dislike any extensions to it.

I understand a desire to keep it pure.  Would it be useful to add a
separate method to support non-Unicode access?  This style of access has
some wonderful properties in terms of decoupling, accessibility,
learnability, and uniformity.  I can image that many kinds of bulk
string operations could benefit from this interface:

t.transform('crc32')
t.transform('md5')
t.transform('des_encode', key=0x10ab03b78495d2)
t.transform('substitution', name='guido', home='netherlands')
t.transform('huffman')


Raymond Hettinger


From aleaxit at yahoo.com  Sat Nov  8 11:09:34 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sat Nov  8 11:09:44 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <m31xsjtcds.fsf@mira.informatik.hu-berlin.de>
References: <1068230662.15995.159.camel@anthem>
	<200311072344.46848.aleaxit@yahoo.com>
	<m31xsjtcds.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200311081709.35052.aleaxit@yahoo.com>

On Saturday 08 November 2003 10:53, Martin v. L?wis wrote:
> Alex Martelli <aleaxit@yahoo.com> writes:
> > I dunno -- it seems that (on a toy case where an 'in' test takes 0.25
> > usec and an .isdigit takes 0.52 to 0.55) we can shave the time to 0.39,
> > about in-between, by avoiding the generation of a bound-method.  Now of
> > course saving 25% or so isn't huge, but maybe it's still worth it...?
>
> If you can avoid creating bound methods in the general case, that
> would be a good thing. Even avoiding them for for strings only would
> be valuable, although I would then ask that you extend your strategy
> to lists.

Lists are mutable, which makes "creating bound methods" (or the equivalent 
thereof) absolutely unavoidable -- e.g.:
    xxx = somelist.somemethod
    " alter somelist at will "
    yyy = xxx( <args if needed> )

xxx needs to be able to refer back to somelist at call time, clearly.

This problem doesn't necessarily apply to method calls _on immutable
objects_ -- as long as their results are not affected by other mutable
"global" aspects of "the environment" in ways which also depend on
the object they were originally called on.

The is... methods of strings would be just perfect -- were it not for the
influence of locale.  Consider isdigit, which isn't influenced by locale.
When x.isdigit is ACCESSED, we can direct that access through a
getter, which, upon examining x's value at that time, KNOWS what the
call will have to return -- whenever the call happens.  So, the getter can
return a callable that always returns True when called, or one that
always returns False when called -- no need to create *new* callable
objects for either, we can just keep two callables around for the purpose
and incref them as needed.

Few situations are as favourable as this one -- immutable object, no
arguments, just two possible constant-returning callables needed.  I
just think it might be worth taking advantage of these rare circumstances,
where feasible, to avoid wasting a little bit of performance.  I think that
this can be done in 2.3.* without changing Python-observable behavior
in any way whatsoever -- just that if, e.g., we do it for both isdigit and
isspace (the two non-locale-dependent string is* methods, i believe),
we'll need 4 callables rather than 2 so that their __name__ and _doc__
attributes can be indistinguishable from the current versions thereof.


> It's not crucial, but it would be an incompatible change to change it.
>
> However, this is irrelevant with respect to bound methods. The
> locale-awareness is in the code of the function, so if you manage to
> invoke that at the point of the call (instead of caching its result),
> then it would still be compatible.

Nope., because the locale-dependent part needs to be applied to
the actual string on which, e.g., isupper is being called.  Therefore,
since locale-dependency applies at call-time, we need a way _at
call-time_ to get to the actual string... i.e., a bound-method or its
equivalent, alas.  Only when attribute-fetch-time behavior can be
substituted for call-time behavior, is the above optimization feasible.


Alex


From aleaxit at yahoo.com  Sat Nov  8 11:43:25 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sat Nov  8 11:43:34 2003
Subject: [Python-Dev] operator.isMappingType
In-Reply-To: <001101c3a5f3$0c3f0b00$66b52c81@oemcomputer>
References: <001101c3a5f3$0c3f0b00$66b52c81@oemcomputer>
Message-ID: <200311081743.25977.aleaxit@yahoo.com>

On Saturday 08 November 2003 13:22, Raymond Hettinger wrote:
> >>> import operator
> >>> map(operator.isMappingType, [(), [], '', u'', {}])
>
> [True, True, True, True, True]
>
> We did not resolve this when it came up before.  Would there be any
> objections to my removing operator.isMappingType()?

No objections from me.  Either it should be made to do something
useful (and I don't know how unless the 'basemapping' abstract type
I mentioned is introduced), or it should be removed -- having it in
its current state seems worst.


Alex


From barry at python.org  Sat Nov  8 12:22:04 2003
From: barry at python.org (Barry Warsaw)
Date: Sat Nov  8 12:22:10 2003
Subject: [Python-Dev] Small change to python-bugs-list
Message-ID: <1068312124.15995.204.camel@anthem>

It seems pretty redundant for the subject header of messages to this
list to have both the SF added [ python-Bugs-XXXXXX ] prefix and the
[Python-bugs-list] prefix added by Mailman.  I removed the latter.

-Barry


From python at rcn.com  Sat Nov  8 12:34:05 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov  8 12:34:18 2003
Subject: [Python-Dev] FW: [Python-checkins] python/dist/src/Doc/whatsnew
	whatsnew24.tex, 1.5, 1.6
Message-ID: <000001c3a61e$82451120$49bc958d@oemcomputer>


> ! A new built-in function, \function{reversed(seq)}, takes a sequence
> ! and returns an iterator that returns the elements of the sequence
> ! in reverse order.
> !
> ! \begin{verbatim}
> ! >>> for i in reversed([1,2,3]):
> ! ...    print i
> ! ...
> ! 3
> ! 2
> ! 1
> ! \end{verbatim}
> !
> ! Note that \function{reversed()} only accepts sequences, not
arbitrary
> ! iterators.  If you want to reverse an iterator, convert it to
> ! a list or tuple with \function{list()} or \function{tuple()}.
> !
> ! \begin{verbatim}
> ! >>> input = open('/etc/passwd', 'r')
> ! >>> for line in reversed(list(input)):
> ! ...   print line
> ! ...
> ! root:*:0:0:System Administrator:/var/root:/bin/tcsh
> !   ...
> ! \end{verbatim}

It would be nice to present the new features in light of what makes them
desirable.  "for elem in reversed(mylist)"  wins in readability, speed,
and memory performance over "mylist.reverse(); for elem in mylist" or
"for elem in mylist[::-1]".

The readability win is predicated on the notion that half-open intervals
are easier to understand in the forwards direction.  'xrange(n//2, 0,
-1)' is not as instantly understandable as reversed(xrange(1, n//2)).
Using the newer form, anyone can quickly identify the first element,
last element, and number of steps.


> + \item The list type gained a \method{sorted(iterable)} method that
> + returns the elements of the iterable as a sorted list.  It also
accepts
> + the \var{cmp}, \var{key}, and \var{reverse} keyword arguments, same
as
> + the \method{sort()} method.  An  example usage:
> +
> + \begin{verbatim}
> + >>> L = [9,7,8,3,2,4,1,6,5]
> + >>> list.sorted(L)
> + [1, 2, 3, 4, 5, 6, 7, 8, 9]
> + >>> L
> + [9, 7, 8, 3, 2, 4, 1, 6, 5]
> + >>>
> + \end{verbatim}
> +
> + Note that the original list is unchanged; the list returned by
> + \method{sorted()} is a newly-created one.

The keys points here are that 1) any iterable may be used as an input
and 2) list.sorted() is an in-line expression which allows it to be used
in  function arguments, lambda expressions, list comprehensions, and
for-loop specifications:

      genTodoList(today, list.sorted(tasks, key=prioritize))
      getlargest = lambda x: list.sorted(x)[-1]
      x = [myfunc(v) for v in list.sorted(mydict.itervalues())]
      for key in list.sorted(mydict): . . .


> + \item The \module{heapq} module is no longer implemented in Python,
> +       having been converted into C.

And it now runs about 10 times faster which makes it viable for
industrial strength applications.

 
>   \item The \module{random} module has a new method called
> \method{getrandbits(N)}

Formerly, there was no O(n) method for generating large random numbers.
The new method supports random.randrange() that arbitrarily large
numbers can be generated (important for public key cryptography and
prime number generation).


From martin at v.loewis.de  Sat Nov  8 15:39:06 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov  8 15:39:16 2003
Subject: [Python-Dev] Optional arguments for str.encode /.decode
In-Reply-To: <002601c3a604$b7b55460$66b52c81@oemcomputer>
References: <002601c3a604$b7b55460$66b52c81@oemcomputer>
Message-ID: <m3brrmsihh.fsf@mira.informatik.hu-berlin.de>

"Raymond Hettinger" <python@rcn.com> writes:

> I understand a desire to keep it pure.  Would it be useful to add a
> separate method to support non-Unicode access?

No.

> This style of access has some wonderful properties in terms of
> decoupling, accessibility, learnability, and uniformity.

No. The .encode approach you are talking about requires users to put
string literals into Python source. This is
a) completely different from encoding, where you learn the encoding
   only at run-time, e.g. from a MIME header or a config file.
b) creates a different way to do the same thing;
There should be one-- and preferably only one --obvious way to do it.

> t.transform('crc32')

Better write this as

crc32.transform(t)

> t.transform('md5')

Better md5.transform(t)

> t.transform('des_encode', key=0x10ab03b78495d2)

Better des.encrypt(t, key=0x10ab03b78495d2). For des, there are two
operations for string conversion, encrypt and decrypt; putting the
direction of the operation in the transform name sux.

> t.transform('substitution', name='guido', home='netherlands')

Better

t.substitute(name='guido', home='netherlands')

> t.transform('huffman')

Better huffman.transform(t)

They are *not* uniform, as you have to remember the various
parameters.

Regards,
Martin

From martin at v.loewis.de  Sat Nov  8 15:51:35 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov  8 15:51:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311081709.35052.aleaxit@yahoo.com>
References: <1068230662.15995.159.camel@anthem>
	<200311072344.46848.aleaxit@yahoo.com>
	<m31xsjtcds.fsf@mira.informatik.hu-berlin.de>
	<200311081709.35052.aleaxit@yahoo.com>
Message-ID: <m37k2ashwo.fsf@mira.informatik.hu-berlin.de>

Alex Martelli <aleaxit@yahoo.com> writes:

> Lists are mutable, which makes "creating bound methods" (or the equivalent 
> thereof) absolutely unavoidable -- e.g.:
>     xxx = somelist.somemethod
>     " alter somelist at will "
>     yyy = xxx( <args if needed> )
> 
> xxx needs to be able to refer back to somelist at call time, clearly.

It depends on the source code. In your example, I agree it is
unavoidable. In the much more common case of

yyy = somelist.somemethod(<args if needed>)

one could call the code of somemethod without creating a bound method,
and, in some cases, without creating the argument tuple. It would be
good if, for

>>> def x(a):
...   a.append(1)
...

the code could change from

  2           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                1 (append)
              6 LOAD_CONST               1 (1)
              9 CALL_FUNCTION            1
             12 POP_TOP
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE

to

  2           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               2 (append)
              6 LOAD_CONST               1 (1)
              9 CALL_METHOD              1
             12 POP_TOP
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE

where CALL_METHOD would read the method name from
stack. Unfortunately, that would be a semantical change, a __getattr__
would not be called anymore. Perhaps that can be changed to

  2           0 LOAD_FAST                0 (a)
              3 LOAD_METHOD              1 (append)
              6 LOAD_CONST               1 (1)
              9 CALL_METHOD              1
             12 POP_TOP
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE

where LOAD_METHOD has the option of returning an fast_method object
(which exists only once per type type and method), and CALL_METHOD
would check for whether there is a fast_method object on stack, and
then explicitly pop "self" from the stack as well.

> Few situations are as favourable as this one -- immutable object, no
> arguments, just two possible constant-returning callables needed. 

Most cases are as favourable as this one. If you immediately call the
bound method, and then discard the bound-method-object, there is no
point in creating it first. The exception is the getattr-style
computation of callables, where getattr cannot know that the result is
going to be called right away.

Regards,
Martin

From aleaxit at yahoo.com  Sat Nov  8 19:00:58 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sat Nov  8 19:01:07 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <m37k2ashwo.fsf@mira.informatik.hu-berlin.de>
References: <1068230662.15995.159.camel@anthem>
	<200311081709.35052.aleaxit@yahoo.com>
	<m37k2ashwo.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200311090100.58703.aleaxit@yahoo.com>

On Saturday 08 November 2003 21:51, Martin v. L?wis wrote:
   ...
> > Lists are mutable, which makes "creating bound methods" (or the
> > equivalent thereof) absolutely unavoidable -- e.g.:

[[ I meant -- but didn't say out loud!-) -- "without changing the current
   bytecode-level logic".  The change I proposed and experimented with
   for strings' is... methods is localized to stringobject.c and requires
   changing nothing except the details of string objects' implementation ]]

> >     xxx = somelist.somemethod
> >     " alter somelist at will "
> >     yyy = xxx( <args if needed> )
> >
> > xxx needs to be able to refer back to somelist at call time, clearly.
>
> It depends on the source code. In your example, I agree it is
> unavoidable. In the much more common case of
>
> yyy = somelist.somemethod(<args if needed>)
>
> one could call the code of somemethod without creating a bound method,
> and, in some cases, without creating the argument tuple. It would be

Yes, if different bytecode was generated, this would of course be possible.

> would not be called anymore. Perhaps that can be changed to
>
>   2           0 LOAD_FAST                0 (a)
>               3 LOAD_METHOD              1 (append)
>               6 LOAD_CONST               1 (1)
>               9 CALL_METHOD              1
>              12 POP_TOP
>              13 LOAD_CONST               0 (None)
>              16 RETURN_VALUE
>
> where LOAD_METHOD has the option of returning an fast_method object
> (which exists only once per type type and method), and CALL_METHOD
> would check for whether there is a fast_method object on stack, and
> then explicitly pop "self" from the stack as well.

Yes, if LOAD_METHOD was also able to return a perfectly generic object
(just in case the attribute named 'append' was not in fact a method), and
CALL_METHOD could fallback to today's CALL_FUNCTION's functionallty.

I'm not sure what's supposed to happen to 'self' if LOAD_METHOD cannot
push a fastmethod object but needs to push something else instead -- would
the something else (anything but a fastmethod) also consume the 'self' then
(whether to ignore it or merge it into a boundmethod)?  It does look like 
this could work, and on a wide range of typical method-call uses.


> > Few situations are as favourable as this one -- immutable object, no
> > arguments, just two possible constant-returning callables needed.
>
> Most cases are as favourable as this one. If you immediately call the

Yes, for the kind of bytecode-level change you're proposing, I do believe
most method-calls do follow this pattern.

> bound method, and then discard the bound-method-object, there is no
> point in creating it first. The exception is the getattr-style
> computation of callables, where getattr cannot know that the result is
> going to be called right away.

...or, no doubt, other special descriptors with getters playing dirty 
tricks.  But I do agree that these are still likely to be a tiny fraction
of use cases.

My proposal was very narrow and safe -- yours is very broad, but
by that very characteristic I think it _may_ make a difference to a
certain pie-throwing-related bet, which my no doubt wouldn't be the
case for mine.  So, Guido may well be more interested in your idea
than in mine, given he's the one directly involved in the pie issues...


Alex


From bac at OCF.Berkeley.EDU  Sat Nov  8 19:11:23 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Sat Nov  8 19:11:29 2003
Subject: [Python-Dev] Time for py3k@python.org or a Py3K Wiki?
In-Reply-To: <16278.33180.5190.95094@montanaro.dyndns.org>
References: <16278.33180.5190.95094@montanaro.dyndns.org>
Message-ID: <3FAD862B.9020302@ocf.berkeley.edu>

Skip Montanaro wrote:

> These various discussions are moving along a bit too rapidly for me to keep
> up.  We have been discussing language issues which are going to impact
> Python 3.0, either by deprecating current language constructs which can't be
> eliminated until then (e.g., the global statement) or by tossing around
> language construct ideas which will have to wait until then for their
> implementation (other mechanisms for variable access in outer scopes).
> Unfortunately, I'm afraid these things are going to get lost in the sea of
> other python-dev topics and be forgotten about then the time is ripe.
> 

The Summaries can help with this (this is why whenever an idea comes up 
for Py3k I try to mention it), but read below for worries on this.

> Maybe this would be a good time to create a py3k@python.org mailing list
> with more restrictions than python-dev (posting by members only?  membership
> by invitation?) so we can more easily separate these ideas from shorter term
> issues and keep track of them in a separate Mailman archive.  I'd suggest
> starting a Wiki, but that seems a bit too "global".  You can restrict Wiki
> mods in MoinMoin to users who are logged in, but I'm not sure you can
> restrict signups very well.
> 

I am working on the next Summary and I am drowning here.  Thanks to PEP 
289 and PEP 323 I was able to basically do a quick overview and just 
point to the PEPs for generator expressions and reiterability/copying 
iterators, respectively.  But I might have to summarize the 'global' 
discussion which is just immense.

The problem is that I am the one doing the summary.  Not only might I 
misunderstand something, but it will most likely have a slightly skewed 
view toward my thinking.

I think Skip is right in having a separate place for *very* long-term 
discussions separate from immediate concerns.  Long-term stuff does not 
need to be followed by everyone nor does everyone care about immediate 
issues like whether something should be backported.  A layer of 
separation might be nice.

Or perhaps a list for maintenance and another for new ideas.  I can see 
having that division work as well.  Dividing into more than two lists, 
though, would quickly turn into a logistical nightmare when ideas need 
to shift to another list.

-Brett


From martin at v.loewis.de  Sat Nov  8 19:28:53 2003
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat Nov  8 19:29:09 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311090100.58703.aleaxit@yahoo.com>
References: <1068230662.15995.159.camel@anthem>
	<200311081709.35052.aleaxit@yahoo.com>
	<m37k2ashwo.fsf@mira.informatik.hu-berlin.de>
	<200311090100.58703.aleaxit@yahoo.com>
Message-ID: <3FAD8A45.1020901@v.loewis.de>

Alex Martelli wrote:
> [[ I meant -- but didn't say out loud!-) -- "without changing the current
>    bytecode-level logic".  The change I proposed and experimented with
>    for strings' is... methods is localized to stringobject.c and requires
>    changing nothing except the details of string objects' implementation ]]

Then I probably don't understand what you are suggesting. What would
LOAD_ATTR do if the object is a string, the attribute is "isdigit", and
you were allowed to assume that the result won't depend on factors that
may change over time?

Regards,
Martin


From bac at OCF.Berkeley.EDU  Sat Nov  8 19:33:30 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Sat Nov  8 19:33:33 2003
Subject: [Python-Dev] string substitution fxn in a new module (was: Can we
 please have a better dict interpolation syntax?)
In-Reply-To: <200310231538.h9NFcIW02840@12-236-54-216.client.attbi.com>
References: <200310230136.h9N1afs19446@oma.cosc.canterbury.ac.nz>	<16279.56778.309781.129469@montanaro.dyndns.org>	<1066921335.11634.103.camel@anthem>
	<16279.62016.628120.971560@montanaro.dyndns.org>
	<200310231538.h9NFcIW02840@12-236-54-216.client.attbi.com>
Message-ID: <3FAD8B5A.3000704@ocf.berkeley.edu>

Guido van Rossum wrote:

> I have too much on my plate (spent too much on generator expressions
> lately :-).
> 
> I am bowing out of the variable substitution discussion after noting
> that putting it in a module would be a great start (like for sets).
> 

This idea seemed to die for no apparent reason.  Fred, Skip, and Barry 
all liked the idea of adding the string substitution code to a module 
(one idea for a name was textutils) and Guido obviously seems receptive 
to the idea.

Do people feel like moving forward with a new module?

-Brett


From aleaxit at yahoo.com  Sun Nov  9 05:43:32 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sun Nov  9 05:43:41 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <3FAD8A45.1020901@v.loewis.de>
References: <1068230662.15995.159.camel@anthem>
	<200311090100.58703.aleaxit@yahoo.com>
	<3FAD8A45.1020901@v.loewis.de>
Message-ID: <200311091143.32121.aleaxit@yahoo.com>

On Sunday 09 November 2003 01:28, Martin v. L?wis wrote:
> Alex Martelli wrote:
> > [[ I meant -- but didn't say out loud!-) -- "without changing the
> > current bytecode-level logic".  The change I proposed and experimented
> > with for strings' is... methods is localized to stringobject.c and
> > requires changing nothing except the details of string objects'
> > implementation ]]
>
> Then I probably don't understand what you are suggesting. What would
> LOAD_ATTR do if the object is a string, the attribute is "isdigit", and
> you were allowed to assume that the result won't depend on factors that
> may change over time?

The LOAD_ATTR attribute, using exactly the machinery it uses today,
gets to PyString_Type's tp_getattro slot, which is unchanged:
	PyObject_GenericGetAttr,		/* tp_getattro */

Only one slot in PyString_Type is changed at all:
	string_getsets,				/* tp_getset */

and it's changed from being 0 as it is now to pointing to:

static PyGetSetDef string_getsets[] = {
	{"isdigit", (getter)string_isdigit_getter, 0, isdigit_getter__doc__},
        /* other getsets snipped */
	{0}
};

string_isdigit_getter is quite similar to today's string_isdigit *EXCEPT*
that instead of returning True or False (via PyBool_FromLong(1) etc)
it returns one of two nullary callables which will always return True or
respectively False when called:

static PyObject * _isdigit_return_true = 0;
static PyObject * _isdigit_return_false = 0;
static PyObject * _isdigit_true_returner(PyObject* ignore_self)
{
	Py_RETURN_TRUE;
}
static PyObject * _isdigit_false_returner(PyObject* ignore_self)
{
	Py_RETURN_FALSE;
}
static PyMethodDef _str_bool_returners[] = {
	{"isdigit", (PyCFunction)_isdigit_false_returner, METH_NOARGS},
	{"isdigit", (PyCFunction)_isdigit_true_returner, METH_NOARGS},
        /* other "bool returners" snipped */
	{0}
};
static PyObject*
_return_returner(PyObject** returner, PyMethodDef *returner_method_def)
{
    if(!*returner)
        *returner = PyCFunction_New(returner_method_def, 0);
    Py_INCREF(*returner);
    return *returner;
}

so string_isdigit_getter uses
    return _return_returner(&_isdigit_return_true, _str_bool_returners+1);
where string_isdigit would instead use
    return PyBool_FromLong(1);

That's all there is to my proposal (we'd have another pair of 'bool 
returners' for isspace -- I think there are no other is... methods of 
strings suitable for this, given locale dependency of letter/upper/lower 
concepts) -- just a simple way to exploit descriptors to avoid creating
bound-method objects -- with a speedup of 30% compared with the
current implementations of isdigit and isspace (but the `in` operator is
jet another 30% faster in both cases).

Your proposal is vastly more ambitious and interesting, it seems to me.


Alex


From skip at manatee.mojam.com  Sun Nov  9 08:00:47 2003
From: skip at manatee.mojam.com (Skip Montanaro)
Date: Sun Nov  9 08:00:58 2003
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200311091300.hA9D0luu028669@manatee.mojam.com>


Bug/Patch Summary
-----------------

562 open / 4322 total bugs (+62)
191 open / 2445 total patches (+15)

New Bugs
--------

Unhelpful error message from cgi module (2003-11-02)
	http://python.org/sf/834840
[2.3.2] zipfile test failure on AIX 5.1 (2003-11-03)
	http://python.org/sf/835145
[2.3.2] bz2 test failure on AIX 4.3.2, Tru64 UNIX (2003-11-03)
	http://python.org/sf/835176
socket object method "makefile" has wrong doc (2003-11-03)
	http://python.org/sf/835300
[2.3.2] test_socket failure on IRIX 6.5 (2003-11-03)
	http://python.org/sf/835338
logging.StreamHandler encodes log message in UTF-8 (2003-11-03)
	http://python.org/sf/835353
MacPython builds with DESTROOT need fixup (2003-11-04)
	http://python.org/sf/835790
strftime month name is encoded somehow (2003-11-04)
	http://python.org/sf/836035
socket.send() on behaves as nonblocking when timeout is set (2003-11-04)
	http://python.org/sf/836058
email generator can give bad output (2003-11-04)
	http://python.org/sf/836293
Windows installer 2.3.2 leaves old version in control panel (2003-11-05)
	http://python.org/sf/836515
pyport.h redeclares gethostname() if SOLARIS is defined (2003-11-06)
	http://python.org/sf/837046
Tk.quit and sys.exit cause Fatal Error (2003-11-06)
	http://python.org/sf/837234
id() for large ptr should return a long (2003-11-06)
	http://python.org/sf/837242
cryptic os.spawnvpe() return code  (2003-11-06)
	http://python.org/sf/837577
socket.gethostbyname raises gaierror, not herror (2003-11-07)
	http://python.org/sf/837929
Unloading extension modules not always safe (2003-11-07)
	http://python.org/sf/838140
PackageManager does not clean up after itself (2003-11-07)
	http://python.org/sf/838144
MacPython for Panther additions includes IDLE (2003-11-08)
	http://python.org/sf/838616

New Patches
-----------

Update htmllib to HTML 4.01 (2003-11-04)
	http://python.org/sf/836088
Build changes for AIX (2003-11-05)
	http://python.org/sf/836434
assert should not generate code if optimized (2003-11-05)
	http://python.org/sf/836879
Avoid "apply" warnings in "logging", still works in 1.52 (2003-11-05)
	http://python.org/sf/836942
make pty.fork() allocate a controlling tty (2003-11-08)
	http://python.org/sf/838546

Closed Bugs
-----------

Named groups limitation in sre (2003-08-25)
	http://python.org/sf/794819
pyclbr.readmodule_ex() (2003-10-11)
	http://python.org/sf/821818
_set_cloexec of tempfile.py uses incorrect error handling (2003-10-11)
	http://python.org/sf/821896
object.h misdocuments PyDict_SetItemString (2003-10-21)
	http://python.org/sf/827856
Docstring for pyclbr.readmodule() is incorrect (2003-10-28)
	http://python.org/sf/831969
Bad Security Advice in CGI Documentation (2003-10-29)
	http://python.org/sf/832515
Incorrect priority 'in' and '==' (2003-10-31)
	http://python.org/sf/833905

Closed Patches
--------------

Added HTTP{,S}ProxyConnection (2002-02-08)
	http://python.org/sf/515003
Add traceback.format_exc (2003-01-30)
	http://python.org/sf/677887
Fix for former/latter confusion in Extending documentation (2003-10-06)
	http://python.org/sf/819012
Implementation PEP 322: Reverse Iteration (2003-11-01)
	http://python.org/sf/834422

From doko at cs.tu-berlin.de  Sun Nov  9 15:01:57 2003
From: doko at cs.tu-berlin.de (Matthias Klose)
Date: Sun Nov  9 15:04:12 2003
Subject: [Python-Dev] python icons?
Message-ID: <16302.40245.488709.729747@gargle.gargle.HOWL>

Wanting to add an icon for gnome/KDE menus for a binary python
package. There are no images in the distribution itself, and not many
on the website. Looking for something like
http://www.python.org/cgi-bin/moinmoin/ in standard resolutions like
64x64, 48x48, 32x32 and 16x16. Maybe something like this could be
added to the Misc directory in the tarball.

	Matthias


From iusty at k1024.org  Sun Nov  9 17:44:45 2003
From: iusty at k1024.org (Iustin Pop)
Date: Sun Nov  9 17:42:46 2003
Subject: [Python-Dev] tempfile.mktemp and os.path.exists
Message-ID: <20031109224445.GA26291@saytrin.hq.k1024.org>

Hello,

The tempfile.mktemp function uses os.path.exists to test whether a file
already exists. Since this returns false for broken symbolic links,
wouldn't it be better if the function would actually do an os.lstat on
the filename?

I know the function is not safe by definition, but this issue could
(with a low probability) cause the file to actually be created in
another directory, as the non-existent target of the symlink, instead of
in the given directory (the one in which the symlink resides).

Regards,
Iustin Pop

From tdelaney at avaya.com  Sun Nov  9 17:54:41 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Sun Nov  9 17:54:48 2003
Subject: [Python-Dev] other "magic strings" issues
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com>

> From: python-dev-bounces+tdelaney=avaya.com@python.org
> 
> I guess the tests should be faster, yes, but I would still 
> want _iterables_ for ascii_*  and digits.
> 
> One issue with allowing "if char in string.letters:" is that 
> these days this will not raise if the alleged 'char' is more
> than one character -- it will give True for (e.g.) 'ab', False
> for (e.g.) 'foobar', since it tests _substrings_.

# inside string.py or equivalent ...

import sets

ascii_letters = sets.Set(ascii_letters)

Hmm - we'd have the iterability, individual characters and speed, but lose iterating in order. I'm sure there's things out there that rely on iterating over ascii_letters in order ... ;)

Tim Delaney

From guido at python.org  Sun Nov  9 21:11:57 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov  9 21:12:15 2003
Subject: [Python-Dev] tempfile.mktemp and os.path.exists
In-Reply-To: Your message of "Mon, 10 Nov 2003 00:44:45 +0200."
	<20031109224445.GA26291@saytrin.hq.k1024.org> 
References: <20031109224445.GA26291@saytrin.hq.k1024.org> 
Message-ID: <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com>

> Hello,
> 
> The tempfile.mktemp function uses os.path.exists to test whether a file
> already exists. Since this returns false for broken symbolic links,
> wouldn't it be better if the function would actually do an os.lstat on
> the filename?
> 
> I know the function is not safe by definition, but this issue could
> (with a low probability) cause the file to actually be created in
> another directory, as the non-existent target of the symlink, instead of
> in the given directory (the one in which the symlink resides).
> 
> Regards,
> Iustin Pop

Sounds like a good suggestion; I'll see if I can check something in.

(However, given that there already exists an attack on this function,
does fixing this actually make any difference?)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Mon Nov 10 03:18:15 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov 10 03:18:22 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com>
Message-ID: <200311100918.15810.aleaxit@yahoo.com>

On Sunday 09 November 2003 11:54 pm, Delaney, Timothy C (Timothy) wrote:
   ...
> ascii_letters = sets.Set(ascii_letters)
>
> Hmm - we'd have the iterability, individual characters and speed, but lose
> iterating in order. I'm sure there's things out there that rely on
> iterating over ascii_letters in order ... ;)

Yes, that's my main use case -- presenting results to the user, so they need
to be in alphabetic order (ascii_lowercase actually, but it's much the same).

Anyway, Guido has already pronounced on such enhancements as "Too
Clever", so we have to keep ascii_lowercase &c as plain strings without any
enhancements and keep the "false positives" &c on 'in' checks.


Alex


From mwh at python.net  Mon Nov 10 05:34:40 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 10 05:34:45 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <1068225424.15995.146.camel@anthem> (Barry Warsaw's message of
	"Fri, 07 Nov 2003 12:17:05 -0500")
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<1068225424.15995.146.camel@anthem>
Message-ID: <2msmkwy0jj.fsf@starship.python.net>

Barry Warsaw <barry@python.org> writes:

> I would love it if what happened really was something like:
>
>>>> from socket import *
>>>> print AF_UNIX
> socket.AF_UNIX
>>>> from errno import *
>>>> print EEXIST
> errno.EEXIST

I've had this idea too.  I like it, I think.  The signal module could
use it too...

Cheers,
mwh

-- 
  I have a feeling that any simple problem can be made arbitrarily
  difficult by imposing a suitably heavy administrative process
  around the development.       -- Joe Armstrong, comp.lang.functional

From mwh at python.net  Mon Nov 10 05:38:05 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 10 05:38:08 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311071708.02744.aleaxit@yahoo.com> (Alex Martelli's message
	of "Fri, 7 Nov 2003 17:08:02 +0100")
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
Message-ID: <2moevky0du.fsf@starship.python.net>

Alex Martelli <aleaxit@yahoo.com> writes:

> From Barry's discussion of the problem of "magic strings" as arguments to 
> .encode / .decode , I was reminded of a blog entry,
>
> http://www.brunningonline.net/simon/blog/archives/000803.html
>
> which mentions another case of "magic strings" that might perhaps be
> (optionally but suggestedly) changed into more-readable attributes (in
> this case, clearly attributes of the 'file' type): mode arguments to 'file'
> calls.  Simon Brunning, the author of that blog entry, argues that
>
> myFile = file(filename, 'rb')
>
> (while of course we're going to keep accepting it forever) is not quite as 
> readable and maintainable as, e.g.:
>
> myFile = file(filename, file.READ + file.BINARY)
>
> Just curious -- what are everybody's feelings about that idea?  I'm
> about +0 on it, myself -- I doubt I'd remember to use it (too much C
> in my past...:-) but I see why others would prefer it.

I think I prefer Guido's idea that when a function argument is almost
always constant you should really have two functions and /F's (?)
idea that there should be a 'textfile' function:

    textfile(path[, mode='r'[, encoding='ascii']]) -> file object

or similar.

Cheers,
mwh

-- 
  Need to Know is usually an interesting UK digest of things that
  happened last week or might happen next week. [...] This week,
  nothing happened, and we don't care.
                           -- NTK Now, 2000-12-29, http://www.ntk.net/

From skip at pobox.com  Sat Nov  8 07:34:07 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Nov 10 08:42:09 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16300.9851.671401.447992@grendel.zope.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
	<16300.9851.671401.447992@grendel.zope.com>
Message-ID: <16300.58047.526545.28711@montanaro.dyndns.org>


    Fred> Frankly, that doesn't bother me, especially given that they've
    Fred> always been in the string module.  But I count more than 4
    Fred> constants that should be kept:

    Fred>     ascii_letters
    Fred>     ascii_lowercase
    Fred>     ascii_uppercase
    Fred>     digits
    Fred>     hexdigits
    Fred>     octdigits
    Fred>     whitespace

Don't forget 'punctuation'.  Maybe it should be 'ascii_punctuation', since
I'm sure there are other punctuation characters which would turn up in
unicode.

    Fred> All of these could reasonably live on both str and unicode if
    Fred> that's not considered pollution.  But if they live in a module,
    Fred> there's no reason not to keep string around for that purpose.

If they are going to be attached to a class, why not to basestring?

    Fred> (I don't object to making them class attributes; I object to creating
    Fred> a new module for them.)

Agreed.  If they stay in a module, I'd prefer they just stay in string.
That creates the minimum amount of churn in people's code.  Anyone who's
been converting to string methods has had to leave all the above constants
alone anyway.

Skip

From fdrake at acm.org  Mon Nov 10 09:25:06 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon Nov 10 09:25:31 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16300.58047.526545.28711@montanaro.dyndns.org>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
	<16300.9851.671401.447992@grendel.zope.com>
	<16300.58047.526545.28711@montanaro.dyndns.org>
Message-ID: <16303.40898.410595.383833@grendel.zope.com>


Skip Montanaro writes:
 > Don't forget 'punctuation'.  Maybe it should be 'ascii_punctuation', since
 > I'm sure there are other punctuation characters which would turn up in
 > unicode.

Ah, yes.

 > If they are going to be attached to a class, why not to basestring?

That makes sense for ascii_* and *digits, perhaps.  whitespace and
punctuation definately change for Unicode, so it's less clear that the
values belong in a base class.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From guido at python.org  Mon Nov 10 10:34:53 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 10 10:35:03 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Mon, 10 Nov 2003 10:34:40 GMT."
	<2msmkwy0jj.fsf@starship.python.net> 
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<1068225424.15995.146.camel@anthem> 
	<2msmkwy0jj.fsf@starship.python.net> 
Message-ID: <200311101534.hAAFYrB15503@12-236-54-216.client.attbi.com>

> > I would love it if what happened really was something like:
> >
> >>>> from socket import *
> >>>> print AF_UNIX
> > socket.AF_UNIX
> >>>> from errno import *
> >>>> print EEXIST
> > errno.EEXIST
> 
> I've had this idea too.  I like it, I think.  The signal module could
> use it too...

Yes, that would be cool for many enums.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Nov 10 10:39:07 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 10 10:39:13 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Mon, 10 Nov 2003 10:38:05 GMT."
	<2moevky0du.fsf@starship.python.net> 
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com> 
	<2moevky0du.fsf@starship.python.net> 
Message-ID: <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com>

> I think I prefer Guido's idea that when a function argument is almost
> always constant you should really have two functions and /F's (?)
> idea that there should be a 'textfile' function:
> 
>     textfile(path[, mode='r'[, encoding='ascii']]) -> file object
> 
> or similar.

I'm not so sure about that in this case.  There are quite a few places
where one writes a wrapper for open() that takes a mode and passes it
on to the real open().  Having to distinguish between multiple open()
functions would complexify this.

OTOH my experimental standard I/O replacement (nondist/sandbox/sio)
does a similar thing, by providing different constructors for
different functionality (buffering, text translation, low-level I/O
basis).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From dan at sidhe.org  Mon Nov 10 10:44:56 2003
From: dan at sidhe.org (Dan Sugalski)
Date: Mon Nov 10 10:40:27 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16303.40898.410595.383833@grendel.zope.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
	<16300.9851.671401.447992@grendel.zope.com>
	<16300.58047.526545.28711@montanaro.dyndns.org>
	<16303.40898.410595.383833@grendel.zope.com>
Message-ID: <Pine.LNX.4.58.0311101043300.15737@sprite.sidhe.org>

On Mon, 10 Nov 2003, Fred L. Drake, Jr. wrote:

>
> Skip Montanaro writes:
>  > Don't forget 'punctuation'.  Maybe it should be 'ascii_punctuation', since
>  > I'm sure there are other punctuation characters which would turn up in
>  > unicode.
>
> Ah, yes.
>
>  > If they are going to be attached to a class, why not to basestring?
>
> That makes sense for ascii_* and *digits, perhaps.

Digits change for Unicode as well. Plus they get potentially...
interesting in some cases, where the digit-ness of a character is arguably
contextually driven, but I think that can be ignored. Most of the time, at
least.

					Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                      teddy bears get drunk


From mwh at python.net  Mon Nov 10 10:56:01 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 10 10:56:08 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> (Guido
	van Rossum's message of "Mon, 10 Nov 2003 07:39:07 -0800")
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<2moevky0du.fsf@starship.python.net>
	<200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com>
Message-ID: <2md6c0xlny.fsf@starship.python.net>

Guido van Rossum <guido@python.org> writes:

>> I think I prefer Guido's idea that when a function argument is almost
>> always constant you should really have two functions and /F's (?)
>> idea that there should be a 'textfile' function:
>> 
>>     textfile(path[, mode='r'[, encoding='ascii']]) -> file object
>> 
>> or similar.
>
> I'm not so sure about that in this case.  There are quite a few places
> where one writes a wrapper for open() that takes a mode and passes it
> on to the real open().

I may just be being thick today but I can't think of many.  Most of
the time passing in an already on file object would be better
interface, surely?  Well, there's things like the codec writers, but
textfile would hopefully subsume them.

> Having to distinguish between multiple open() functions would
> complexify this.
>
> OTOH my experimental standard I/O replacement (nondist/sandbox/sio)
> does a similar thing, by providing different constructors for
> different functionality (buffering, text translation, low-level I/O
> basis).

Does text translation cover unicode issues here?

Cheers,
mwh

-- 
  Never meddle in the affairs of NT. It is slow to boot and quick to
  crash.                                             -- Stephen Harris
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

From fdrake at acm.org  Mon Nov 10 11:01:48 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon Nov 10 11:02:05 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <Pine.LNX.4.58.0311101043300.15737@sprite.sidhe.org>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
	<16300.9851.671401.447992@grendel.zope.com>
	<16300.58047.526545.28711@montanaro.dyndns.org>
	<16303.40898.410595.383833@grendel.zope.com>
	<Pine.LNX.4.58.0311101043300.15737@sprite.sidhe.org>
Message-ID: <16303.46700.213857.424250@grendel.zope.com>


Dan Sugalski writes:
 > Digits change for Unicode as well. Plus they get potentially...
 > interesting in some cases, where the digit-ness of a character is arguably
 > contextually driven, but I think that can be ignored. Most of the time, at
 > least.

That depends on how we define "digits" for this purpose.  I've always
thought of the *digits strings as true constants; other may disagree.

I understand that the digit-ness of a Unicode character is defined in
more interesting ways than simply the ASCII characters 0-9.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From dan at sidhe.org  Mon Nov 10 11:18:10 2003
From: dan at sidhe.org (Dan Sugalski)
Date: Mon Nov 10 11:13:42 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <16303.46700.213857.424250@grendel.zope.com>
References: <002701c3a55c$97088c80$bfb42c81@oemcomputer>
	<200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com>
	<16299.60650.800354.930018@grendel.zope.com>
	<200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com>
	<16300.9113.720680.750981@grendel.zope.com>
	<200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com>
	<16300.9851.671401.447992@grendel.zope.com>
	<16300.58047.526545.28711@montanaro.dyndns.org>
	<16303.40898.410595.383833@grendel.zope.com>
	<Pine.LNX.4.58.0311101043300.15737@sprite.sidhe.org>
	<16303.46700.213857.424250@grendel.zope.com>
Message-ID: <Pine.LNX.4.58.0311101111280.15737@sprite.sidhe.org>

On Mon, 10 Nov 2003, Fred L. Drake, Jr. wrote:

>
> Dan Sugalski writes:
>  > Digits change for Unicode as well. Plus they get potentially...
>  > interesting in some cases, where the digit-ness of a character is arguably
>  > contextually driven, but I think that can be ignored. Most of the time, at
>  > least.
>
> That depends on how we define "digits" for this purpose.  I've always
> thought of the *digits strings as true constants; other may disagree.

Fair enough. The languages that use non-latin alphabets all have
characters for digits, though many allow the use latin digits as well. I
suppose it's a matter of taste as to whether the non-latin digit
characters are treated as true digits or not.

There's also the issue of interpreting numeric constants in
general if you open up the set of digits with Unicode--it could be
considered odd to allow kanji characters that are tagged as digits to not
be considered digits for numeric constants or string->number conversions.

					Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                      teddy bears get drunk


From guido at python.org  Mon Nov 10 11:34:28 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 10 11:34:40 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: Your message of "Mon, 10 Nov 2003 15:56:01 GMT."
	<2md6c0xlny.fsf@starship.python.net> 
References: <000901c3a501$8fb10800$1535c797@oemcomputer>
	<1068219089.15995.128.camel@anthem>
	<200311071649.27884.aleaxit@yahoo.com>
	<200311071708.02744.aleaxit@yahoo.com>
	<2moevky0du.fsf@starship.python.net>
	<200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> 
	<2md6c0xlny.fsf@starship.python.net> 
Message-ID: <200311101634.hAAGYSW15612@12-236-54-216.client.attbi.com>

> >>     textfile(path[, mode='r'[, encoding='ascii']]) -> file object
> >> 
> >> or similar.
> >
> > I'm not so sure about that in this case.  There are quite a few places
> > where one writes a wrapper for open() that takes a mode and passes it
> > on to the real open().
> 
> I may just be being thick today but I can't think of many.  Most of
> the time passing in an already on file object would be better
> interface, surely?  Well, there's things like the codec writers, but
> textfile would hopefully subsume them.

Here's a pattern that I use frequently in unit tests:

  def makefile(self, data, mode="wb"):
      fn = tempfile.mktemp()
      self.tempfilenames.append(fn)
      f = open(fn, mode)
      f.write(data)
      f.close()
      return fn

> > Having to distinguish between multiple open() functions would
> > complexify this.
> >
> > OTOH my experimental standard I/O replacement (nondist/sandbox/sio)
> > does a similar thing, by providing different constructors for
> > different functionality (buffering, text translation, low-level I/O
> > basis).
> 
> Does text translation cover unicode issues here?

Yes, the framework should support Unicode encoding/decoding too (but
the implementation doesn't do much of this -- have a look).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Mon Nov 10 12:16:34 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Nov 10 12:16:47 2003
Subject: [Python-Dev] other "magic strings" issues
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com>
Message-ID: <16303.51186.409765.238472@montanaro.dyndns.org>


    Tim> # inside string.py or equivalent ...

    Tim> import sets

    Tim> ascii_letters = sets.Set(ascii_letters)

    Tim> Hmm - we'd have the iterability, individual characters and speed,
    Tim> but lose iterating in order. I'm sure there's things out there that
    Tim> rely on iterating over ascii_letters in order ... ;)

Actually, I suspect that in most cases you wouldn't have speed unless
sets.Set() is rewritten in C.  See my previous post with the timeit.py
results.

Skip

From michael at petroni.cc  Mon Nov 10 14:38:56 2003
From: michael at petroni.cc (Michael Petroni)
Date: Mon Nov 10 14:39:00 2003
Subject: [Python-Dev] socket listen problem under aix
Message-ID: <3FAFE950.5020705@petroni.cc>

hi!

sorry for posting here as a non-member and non-developer, but i've a 
problem that is (maybe) a bug:

i'm running python 2.2.3 under aix 4.3.3 compiled with gcc version 
2.9-aix51-020209.

subsequent accept calls in the socket library block after a defined 
number of calls depending on the accept queue size. the call then never 
returns, a connection to the server port gets a timeout and netstat -a 
still shows the port as listening.

see the following example code:

----
import socket
queue_size = 6
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 7111))
s.listen(queue_size)
while 1:
	(c, addr) = s.accept()
	c.close()
----

depending on "queue_size" the loop blocks after n calls:

size calls
1	1
2	3
3	4
4	6
5	7
6	9

i've tried the same code on various other systems with different python 
versions -> no problem at all. looks like that some ressources for the 
tcp connection queue are not freed any more.

have i found a bug or did i miss something?

sorry for the inconvenience once more and

thx.
mike


From guido at python.org  Mon Nov 10 14:49:36 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 10 14:50:25 2003
Subject: [Python-Dev] socket listen problem under aix
In-Reply-To: Your message of "Mon, 10 Nov 2003 20:38:56 +0100."
	<3FAFE950.5020705@petroni.cc> 
References: <3FAFE950.5020705@petroni.cc> 
Message-ID: <200311101949.hAAJnaO15861@12-236-54-216.client.attbi.com>

> i'm running python 2.2.3 under aix 4.3.3 compiled with gcc version 
> 2.9-aix51-020209.
> 
> subsequent accept calls in the socket library block after a defined 
> number of calls depending on the accept queue size. the call then never 
> returns, a connection to the server port gets a timeout and netstat -a 
> still shows the port as listening.
> 
> see the following example code:
> 
> ----
> import socket
> queue_size = 6
> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> s.bind(("", 7111))
> s.listen(queue_size)
> while 1:
> 	(c, addr) = s.accept()
> 	c.close()
> ----
> 
> depending on "queue_size" the loop blocks after n calls:
> 
> size calls
> 1	1
> 2	3
> 3	4
> 4	6
> 5	7
> 6	9
> 
> i've tried the same code on various other systems with different python 
> versions -> no problem at all. looks like that some ressources for the 
> tcp connection queue are not freed any more.

Almost certainly the problem is either in AIX or in your understanding
of how sockets work, and not in Python's socket module.  The socket
module just calls the underlying system calls; it doesn't introduce
this kind of problems by itself (but it doesn't prevent you from
making a bogus sequence of calls either).

If you want help debugging this issue, comp.lang.python would be a
more appropriate place to ask.  (My immdiate question after seeing
your code above is, what is the client doing?)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From vladimir.marangozov at optimay.com  Mon Nov 10 16:12:35 2003
From: vladimir.marangozov at optimay.com (Marangozov, Vladimir (Vladimir))
Date: Mon Nov 10 16:12:41 2003
Subject: [Python-Dev] Re: other "magic strings" issues
Message-ID: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>

Hi,

[Guido]
> I do think that keeping the string module around without all the
> functions it historically contained would be a mistake, confusing
> folks.  This error is pretty clear:
> 
>   >>> import string
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in ?
>   ImportError: No module named string
>   >>> 
> 
> But this one is much more mystifying:
> 
>   >>> import string
>   >>> print string.join(["a", "b"], ".")
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in ?
>   AttributeError: 'module' object has no attribute 'join'
>  >>> 

I am trying to understand what's the bottom line of this thread.
It looks like people are suggesting that the venerable string
module should vanish + provide its functions as object attributes.

Well, I have to say that I actually like the fact that I can be
procedural with strings and not object-oriented.  Having all str
functions as object attributes is too much OO for my mind with
regard to this basic type.  And too much OOrientation isn't
always simple to grasp (despite that we can have anything as an
object attribute now and regardless some nice pipe-like serialized
string constructs achieved with attributes).

Put it another way, it's good to have all string functions being
attributes to a single well-known object, that object being the
'string' module, instead of spreading it all over...  So add the
attributes if you wish so (I respect OO minds), but don't zap
the module (i.e. please respect mine ;-).

Cheers,
Vladimir

From iusty at k1024.org  Mon Nov 10 16:25:05 2003
From: iusty at k1024.org (Iustin Pop)
Date: Mon Nov 10 16:24:05 2003
Subject: [Python-Dev] tempfile.mktemp and os.path.exists
In-Reply-To: <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com>
References: <20031109224445.GA26291@saytrin.hq.k1024.org>
	<200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com>
Message-ID: <20031110212505.GB5361@saytrin.hq.k1024.org>

On Sun, Nov 09, 2003 at 06:11:57PM -0800, Guido van Rossum wrote:
> > The tempfile.mktemp function uses os.path.exists to test whether a file
> > already exists. Since this returns false for broken symbolic links,
> > wouldn't it be better if the function would actually do an os.lstat on
> > the filename?
> > 
> > I know the function is not safe by definition, but this issue could
> > (with a low probability) cause the file to actually be created in
> > another directory, as the non-existent target of the symlink, instead of
> > in the given directory (the one in which the symlink resides).
> Sounds like a good suggestion; I'll see if I can check something in.
The fix is trivial (IMHO). A patch is attached.
> 
> (However, given that there already exists an attack on this function,
> does fixing this actually make any difference?)
Not really, but it is defensive programming (since the module is
security-oriented). Maybe you want a non-existent name for a block
device or a pipe (which mkstemp doesn't provide).

I happened to look into the module to see if I can replace some
hand-written functions with the ones in the module and I saw that
mktemp() could be improved maybe.

Regards,
Iustin Pop
-------------- next part --------------
diff -urN old/tempfile.py new/tempfile.py
--- old/tempfile.py	2003-11-10 23:07:46.000000000 +0200
+++ new/tempfile.py	2003-11-10 23:22:57.000000000 +0200
@@ -338,7 +338,9 @@
     for seq in xrange(TMP_MAX):
         name = names.next()
         file = _os.path.join(dir, prefix + name + suffix)
-        if not _os.path.exists(file):
+        try:
+            _os.lstat(file)
+        except _os.error:
             return file
 
     raise IOError, (_errno.EEXIST, "No usable temporary filename found")
From guido at python.org  Mon Nov 10 16:30:12 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 10 16:30:19 2003
Subject: [Python-Dev] tempfile.mktemp and os.path.exists
In-Reply-To: Your message of "Mon, 10 Nov 2003 23:25:05 +0200."
	<20031110212505.GB5361@saytrin.hq.k1024.org> 
References: <20031109224445.GA26291@saytrin.hq.k1024.org>
	<200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> 
	<20031110212505.GB5361@saytrin.hq.k1024.org> 
Message-ID: <200311102130.hAALUCT16049@12-236-54-216.client.attbi.com>

> > Sounds like a good suggestion; I'll see if I can check something in.
> The fix is trivial (IMHO). A patch is attached.

Now there you are wrong, my friend. :-)

> > (However, given that there already exists an attack on this function,
> > does fixing this actually make any difference?)
> Not really, but it is defensive programming (since the module is
> security-oriented). Maybe you want a non-existent name for a block
> device or a pipe (which mkstemp doesn't provide).

I use it all the time for situations where I have to name a file that
an external program is going to create for me.

> I happened to look into the module to see if I can replace some
> hand-written functions with the ones in the module and I saw that
> mktemp() could be improved maybe.
> 
> Regards,
> Iustin Pop
> 
> --zhXaljGHf11kAtnf
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: attachment; filename="tempfile.patch"
> 
> diff -urN old/tempfile.py new/tempfile.py
> --- old/tempfile.py	2003-11-10 23:07:46.000000000 +0200
> +++ new/tempfile.py	2003-11-10 23:22:57.000000000 +0200
> @@ -338,7 +338,9 @@
>      for seq in xrange(TMP_MAX):
>          name = names.next()
>          file = _os.path.join(dir, prefix + name + suffix)
> -        if not _os.path.exists(file):
> +        try:
> +            _os.lstat(file)
> +        except _os.error:
>              return file
>  
>      raise IOError, (_errno.EEXIST, "No usable temporary filename found")

This fix would break on non-Unix platforms (the module should work
everywhere).  Fortunately I already checked something in that *does*
work across platforms. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Mon Nov 10 16:31:47 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Nov 10 16:30:46 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com
 >
Message-ID: <5.1.0.14.0.20031110161755.030b1540@mail.telecommunity.com>

At 10:12 PM 11/10/03 +0100, Marangozov, Vladimir (Vladimir) wrote:

>Put it another way, it's good to have all string functions being
>attributes to a single well-known object, that object being the
>'string' module, instead of spreading it all over...  So add the
>attributes if you wish so (I respect OO minds), but don't zap
>the module (i.e. please respect mine ;-).

Actually, even in Python 2.2, you can access the same functions as 
'str.whatever', e.g.:

Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
 >>> str.upper("foo")
'FOO'
 >>> str.join(" ",["1","2","3"])
'1 2 3'
 >>> str.split("x y z")
['x', 'y', 'z']
 >>> str.count("a+b+c","+")
2

In fact, the only items missing from 'str' as opposed to 'string' in 2.2 are:

Constants
---------
ascii_letters
ascii_lowercase
ascii_uppercase
digits
hexdigits
letters
lowercase
octdigits
printable
punctuation
uppercase
whitespace


Functions and Exceptions
------------------------
capwords  (actually, the same as str.title)
joinfields  (alias for join, so str.join really suffices)
index_error
maketrans
atof, atof_error
atoi, atoi_error
atol, atol_error


So, the actual discussion is mostly about what to do with the constants, as 
the functions are already pretty much available in 'str'.  Note that since 
'str' is a built-in, it doesn't have to be imported, and it's three less 
characters to type.  So, if you prefer a non-object style for strings, you 
could still do it if string went away.  For legacy code support, you could 
probably even do:

sys.modules['string'] = str

in some cases.  :)


From aleaxit at yahoo.com  Mon Nov 10 16:51:10 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Mon Nov 10 16:52:49 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
Message-ID: <200311102251.10904.aleaxit@yahoo.com>

On Monday 10 November 2003 10:12 pm, Marangozov, Vladimir (Vladimir) wrote:
   ...
> Put it another way, it's good to have all string functions being
> attributes to a single well-known object, that object being the
> 'string' module, instead of spreading it all over...  So add the

Not sure anybody wants to "spread it all over", for whatever "it".
str.whatever should be usable where string.whatever is usable
now, so, what would the problem be...?

As for being able to call, when appropriate:

    something.amethod(somestring, whatever)

rather than _having_ to call

    somestring.amethod(whatever)

I _do_ sympathize with this.  str.methodname, being an unbound
method, may NOT be usable quite as freely ("quite as polymorphically",
in OO-speak:-) as string.method was recently.  E.g. :

>>> import string
>>> string.upper(u'ciao')
u'CIAO'
>>> string.upper('ciao')
'CIAO'
>>> str.upper('ciao')
'CIAO'
>>> str.upper(u'ciao')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode'

in other words, string.upper is currently callable on ANY object which
internally defines an .upper() method, whether that object is a string or
not; str.upper instead does typechecking on its first argument -- you can
only call it on a bona fide instance of str or a subclass, not polymorphically
in the usual Python sense of signature-based polymorphism.

So, if I have a sequence with some strings and some unicode objects
I cannot easily get a correspondent sequence with each item uppercased
_except_ with string.upper...:

>>> map(string.upper, ('ciao', u'ciao'))
['CIAO', u'CIAO']

>>> map(str.upper, ('ciao', u'ciao'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode'

>>> map(unicode.upper, ('ciao', u'ciao'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str'


To be honest I don't currently have any real use case that's quite like this 
(i.e., based on a mix of string and unicode objects), but I DO have cases
in completely different domains where I end up coding the like of (sigh...):

def fooper(obj): return obj.foop()
foopresults = map(fooper, lotsofobjects)

or equivalently:

foopresults = map(lambda obj: obj.foop(), lotsofobjects)

or also (probably best for this specific use case):

foopresults = [ obj.foop() for obj in lotsofobjects ]


map may not be the best example, because it's old-ish and most replaceable
with list comprehensions (optionally with zip), itertools, etc.  But I _do_ 
need an "easily expressed callable" for _many_ perfectly current and indeed
future (2.4) idioms.  E.g., "order the items of lotsobjs in increasing order 
of their .foop() results" in 2.4 would be

lotsobjs.sort(key=lambda obj: obj.foop())

...and we're back to wishing for a way to pass a nonlambda-callable.  E.g.
a string-related example would be "order the strings in list lotsastrings 
(which may be all plain strings, or all unicode strings, on different calls 
of this overall function) in case-insensitive-alphabetical order".  In 2.4 
_with_ the string module that's a snap:

lotsastrings.sort(key=string.upper)

_without_ string.upper's quiet and strong polymorphism, we'd be back to
lambda, or a tiny def for the equivalent of string.upper, or nailing down
the exact type involved, leading perhaps to nasty code such as

lotsastrings.sort(key=type(lotsastrings[0]).upper)

(not ADVOCATING this by any means -- on the contrary, pointing it out as a 
danger of having such callables ONLY available as unbound methods and
thus requiring the exact type...).


But it does not seem to me that keeping module string as it is now is
necessarily the ideal solution to this small quandary.  It works for those
methods which strings _used_ to have in 1.5.2 -- try, e.g., string.title --
and you're hosed again.  _Extending_ module string doesn't seem like
a pleasant option either -- and if we did we'd _still_ leave exactly the
same problem open for non-string objects on which we'd like to get a
polymorphic callable that's normally a method (key= parameter in sort,
all the 'func' and 'pred' parameters to itertools functions, ...).

Rather, why not think of a slightly more general solution...?  We could
have an object -- say "callmethod", although I'm sure better names can
easily be found by this creative crowd;-) -- with functionality roughly
equivalent to the following Python code...:

class MethodCaller(object):
    def __getattr__(self, name):
        def callmethod(otherself, *args, **kwds):
            return getattr(otherself, name)(*args, **kwds)
        return callmethod
callmethod = MethodCaller()

Now, the ability to obtain callables for each of the above examples
becomes available -- with parametric polymorphism just like Python
normally offers.  Performance with this implementation would surely
be bad (but then, string.upper(s) is over twice as slow as s.upper()
and I don't hear complaints on that...:-) but maybe a more clever
implementation might partly compensate... _if_, that is, there IS
any interest at all in the idea, of course!


Alex


From djc at object-craft.com.au  Mon Nov 10 17:45:54 2003
From: djc at object-craft.com.au (Dave Cole)
Date: Mon Nov 10 17:45:59 2003
Subject: [Python-Dev] socket listen problem under aix
In-Reply-To: <3FAFE950.5020705@petroni.cc>
References: <3FAFE950.5020705@petroni.cc>
Message-ID: <1068504354.10481.11.camel@echidna.object-craft.com.au>

On Tue, 2003-11-11 at 06:38, Michael Petroni wrote:
> hi!
> 
> sorry for posting here as a non-member and non-developer, but i've a 
> problem that is (maybe) a bug:
> 
> i'm running python 2.2.3 under aix 4.3.3 compiled with gcc version 
> 2.9-aix51-020209.
> 
> subsequent accept calls in the socket library block after a defined 
> number of calls depending on the accept queue size. the call then never 
> returns, a connection to the server port gets a timeout and netstat -a 
> still shows the port as listening.
> 
> see the following example code:
> 
> ----
> import socket
> queue_size = 6
> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> s.bind(("", 7111))
> s.listen(queue_size)
> while 1:
> 	(c, addr) = s.accept()
> 	c.close()
> ----

It may not have any effect but try changing the socket call to this:

  socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)

I recently wrote a non-blocking select loop server (in C) on AIX 4.3.3
and the program would run for hours then fail in strange ways.  When I
changed the socket() protocol argument from zero to IPROTO_TCP the
problems went away.

It is a long shot, but it is worth a try.

- Dave

-- 
http://www.object-craft.com.au


From iusty at k1024.org  Mon Nov 10 17:59:40 2003
From: iusty at k1024.org (Iustin Pop)
Date: Mon Nov 10 17:57:45 2003
Subject: [Python-Dev] tempfile.mktemp and os.path.exists
In-Reply-To: <200311102130.hAALUCT16049@12-236-54-216.client.attbi.com>
References: <20031109224445.GA26291@saytrin.hq.k1024.org>
	<200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com>
	<20031110212505.GB5361@saytrin.hq.k1024.org>
	<200311102130.hAALUCT16049@12-236-54-216.client.attbi.com>
Message-ID: <20031110225940.GC5361@saytrin.hq.k1024.org>

> Now there you are wrong, my friend. :-)
> 
> This fix would break on non-Unix platforms (the module should work
> everywhere).  Fortunately I already checked something in that *does*
> work across platforms. :-)

Thanks for reminding me - sometimes I forget that, even if I cherish the
portability of python!

Iustin Pop

From dingy at shvns.com  Mon Nov 10 18:43:36 2003
From: dingy at shvns.com (Ding Yong)
Date: Mon Nov 10 19:42:24 2003
Subject: [Python-Dev] Re: Python-Dev Digest, Vol 4, Issue 33
References: <E1AJEPW-0006Nz-Rh@mail.python.org>
Message-ID: <004001c3a7e4$75d7b9c0$f065a8c0@dingyong>


----- Original Message ----- 
From: <python-dev-request@python.org>
To: <python-dev@python.org>
Sent: Tuesday, November 11, 2003 12:57 AM
Subject: Python-Dev Digest, Vol 4, Issue 33


> Send Python-Dev mailing list submissions to
> python-dev@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.python.org/mailman/listinfo/python-dev
> or, via email, send a message with subject or body 'help' to
> python-dev-request@python.org
>
> You can reach the person managing the list at
> python-dev-owner@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-Dev digest..."
>
>
> Today's Topics:
>
>    1. python icons? (Matthias Klose)
>    2. tempfile.mktemp and os.path.exists (Iustin Pop)
>    3. RE: other "magic strings" issues (Delaney, Timothy C (Timothy))
>    4. Re: tempfile.mktemp and os.path.exists (Guido van Rossum)
>    5. Re: other "magic strings" issues (Alex Martelli)
>    6. Re: other "magic strings" issues (Michael Hudson)
>    7. Re: other "magic strings" issues (Michael Hudson)
>    8. Re: other "magic strings" issues (Skip Montanaro)
>    9. Re: other "magic strings" issues (Fred L. Drake, Jr.)
>   10. Re: other "magic strings" issues (Guido van Rossum)
>   11. Re: other "magic strings" issues (Guido van Rossum)
>   12. Re: other "magic strings" issues (Dan Sugalski)
>   13. Re: other "magic strings" issues (Michael Hudson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 9 Nov 2003 21:01:57 +0100
> From: Matthias Klose <doko@cs.tu-berlin.de>
> Subject: [Python-Dev] python icons?
> To: python-dev@python.org
> Message-ID: <16302.40245.488709.729747@gargle.gargle.HOWL>
> Content-Type: text/plain; charset=us-ascii
>
> Wanting to add an icon for gnome/KDE menus for a binary python
> package. There are no images in the distribution itself, and not many
> on the website. Looking for something like
> http://www.python.org/cgi-bin/moinmoin/ in standard resolutions like
> 64x64, 48x48, 32x32 and 16x16. Maybe something like this could be
> added to the Misc directory in the tarball.
>
> Matthias
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 10 Nov 2003 00:44:45 +0200
> From: Iustin Pop <iusty@k1024.org>
> Subject: [Python-Dev] tempfile.mktemp and os.path.exists
> To: python-dev@python.org
> Message-ID: <20031109224445.GA26291@saytrin.hq.k1024.org>
> Content-Type: text/plain; charset=us-ascii
>
> Hello,
>
> The tempfile.mktemp function uses os.path.exists to test whether a file
> already exists. Since this returns false for broken symbolic links,
> wouldn't it be better if the function would actually do an os.lstat on
> the filename?
>
> I know the function is not safe by definition, but this issue could
> (with a low probability) cause the file to actually be created in
> another directory, as the non-existent target of the symlink, instead of
> in the given directory (the one in which the symlink resides).
>
> Regards,
> Iustin Pop
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 10 Nov 2003 09:54:41 +1100
> From: "Delaney, Timothy C (Timothy)" <tdelaney@avaya.com>
> Subject: RE: [Python-Dev] other "magic strings" issues
> To: <python-dev@python.org>
> Message-ID:
> <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> > From: python-dev-bounces+tdelaney=avaya.com@python.org
> >
> > I guess the tests should be faster, yes, but I would still
> > want _iterables_ for ascii_*  and digits.
> >
> > One issue with allowing "if char in string.letters:" is that
> > these days this will not raise if the alleged 'char' is more
> > than one character -- it will give True for (e.g.) 'ab', False
> > for (e.g.) 'foobar', since it tests _substrings_.
>
> # inside string.py or equivalent ...
>
> import sets
>
> ascii_letters = sets.Set(ascii_letters)
>
> Hmm - we'd have the iterability, individual characters and speed, but lose
iterating in order. I'm sure there's things out there that rely on iterating
over ascii_letters in order ... ;)
>
> Tim Delaney
>
>
>
> ------------------------------
>
> Message: 4
> Date: Sun, 09 Nov 2003 18:11:57 -0800
> From: Guido van Rossum <guido@python.org>
> Subject: Re: [Python-Dev] tempfile.mktemp and os.path.exists
> To: Iustin Pop <iusty@k1024.org>
> Cc: python-dev@python.org
> Message-ID: <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com>
>
> > Hello,
> >
> > The tempfile.mktemp function uses os.path.exists to test whether a file
> > already exists. Since this returns false for broken symbolic links,
> > wouldn't it be better if the function would actually do an os.lstat on
> > the filename?
> >
> > I know the function is not safe by definition, but this issue could
> > (with a low probability) cause the file to actually be created in
> > another directory, as the non-existent target of the symlink, instead of
> > in the given directory (the one in which the symlink resides).
> >
> > Regards,
> > Iustin Pop
>
> Sounds like a good suggestion; I'll see if I can check something in.
>
> (However, given that there already exists an attack on this function,
> does fixing this actually make any difference?)
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 10 Nov 2003 09:18:15 +0100
> From: Alex Martelli <aleaxit@yahoo.com>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: "Delaney, Timothy C (Timothy)" <tdelaney@avaya.com>,
> <python-dev@python.org>
> Message-ID: <200311100918.15810.aleaxit@yahoo.com>
> Content-Type: text/plain;  charset="iso-8859-1"
>
> On Sunday 09 November 2003 11:54 pm, Delaney, Timothy C (Timothy) wrote:
>    ...
> > ascii_letters = sets.Set(ascii_letters)
> >
> > Hmm - we'd have the iterability, individual characters and speed, but
lose
> > iterating in order. I'm sure there's things out there that rely on
> > iterating over ascii_letters in order ... ;)
>
> Yes, that's my main use case -- presenting results to the user, so they
need
> to be in alphabetic order (ascii_lowercase actually, but it's much the
same).
>
> Anyway, Guido has already pronounced on such enhancements as "Too
> Clever", so we have to keep ascii_lowercase &c as plain strings without
any
> enhancements and keep the "false positives" &c on 'in' checks.
>
>
> Alex
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 10 Nov 2003 10:34:40 +0000
> From: Michael Hudson <mwh@python.net>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: python-dev@python.org
> Message-ID: <2msmkwy0jj.fsf@starship.python.net>
> Content-Type: text/plain; charset=us-ascii
>
> Barry Warsaw <barry@python.org> writes:
>
> > I would love it if what happened really was something like:
> >
> >>>> from socket import *
> >>>> print AF_UNIX
> > socket.AF_UNIX
> >>>> from errno import *
> >>>> print EEXIST
> > errno.EEXIST
>
> I've had this idea too.  I like it, I think.  The signal module could
> use it too...
>
> Cheers,
> mwh
>
> -- 
>   I have a feeling that any simple problem can be made arbitrarily
>   difficult by imposing a suitably heavy administrative process
>   around the development.       -- Joe Armstrong, comp.lang.functional
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 10 Nov 2003 10:38:05 +0000
> From: Michael Hudson <mwh@python.net>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: python-dev@python.org
> Message-ID: <2moevky0du.fsf@starship.python.net>
> Content-Type: text/plain; charset=us-ascii
>
> Alex Martelli <aleaxit@yahoo.com> writes:
>
> > From Barry's discussion of the problem of "magic strings" as arguments
to
> > .encode / .decode , I was reminded of a blog entry,
> >
> > http://www.brunningonline.net/simon/blog/archives/000803.html
> >
> > which mentions another case of "magic strings" that might perhaps be
> > (optionally but suggestedly) changed into more-readable attributes (in
> > this case, clearly attributes of the 'file' type): mode arguments to
'file'
> > calls.  Simon Brunning, the author of that blog entry, argues that
> >
> > myFile = file(filename, 'rb')
> >
> > (while of course we're going to keep accepting it forever) is not quite
as
> > readable and maintainable as, e.g.:
> >
> > myFile = file(filename, file.READ + file.BINARY)
> >
> > Just curious -- what are everybody's feelings about that idea?  I'm
> > about +0 on it, myself -- I doubt I'd remember to use it (too much C
> > in my past...:-) but I see why others would prefer it.
>
> I think I prefer Guido's idea that when a function argument is almost
> always constant you should really have two functions and /F's (?)
> idea that there should be a 'textfile' function:
>
>     textfile(path[, mode='r'[, encoding='ascii']]) -> file object
>
> or similar.
>
> Cheers,
> mwh
>
> -- 
>   Need to Know is usually an interesting UK digest of things that
>   happened last week or might happen next week. [...] This week,
>   nothing happened, and we don't care.
>                            -- NTK Now, 2000-12-29, http://www.ntk.net/
>
>
>
> ------------------------------
>
> Message: 8
> Date: Sat, 8 Nov 2003 06:34:07 -0600
> From: Skip Montanaro <skip@pobox.com>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: "Fred L. Drake, Jr." <fdrake@acm.org>
> Cc: Guido van Rossum <guido@python.org>, python-dev@python.org
> Message-ID: <16300.58047.526545.28711@montanaro.dyndns.org>
> Content-Type: text/plain; charset=us-ascii
>
>
>     Fred> Frankly, that doesn't bother me, especially given that they've
>     Fred> always been in the string module.  But I count more than 4
>     Fred> constants that should be kept:
>
>     Fred>     ascii_letters
>     Fred>     ascii_lowercase
>     Fred>     ascii_uppercase
>     Fred>     digits
>     Fred>     hexdigits
>     Fred>     octdigits
>     Fred>     whitespace
>
> Don't forget 'punctuation'.  Maybe it should be 'ascii_punctuation', since
> I'm sure there are other punctuation characters which would turn up in
> unicode.
>
>     Fred> All of these could reasonably live on both str and unicode if
>     Fred> that's not considered pollution.  But if they live in a module,
>     Fred> there's no reason not to keep string around for that purpose.
>
> If they are going to be attached to a class, why not to basestring?
>
>     Fred> (I don't object to making them class attributes; I object to
creating
>     Fred> a new module for them.)
>
> Agreed.  If they stay in a module, I'd prefer they just stay in string.
> That creates the minimum amount of churn in people's code.  Anyone who's
> been converting to string methods has had to leave all the above constants
> alone anyway.
>
> Skip
>
>
>
> ------------------------------
>
> Message: 9
> Date: Mon, 10 Nov 2003 09:25:06 -0500
> From: "Fred L. Drake, Jr." <fdrake@acm.org>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: skip@pobox.com
> Cc: python-dev@python.org
> Message-ID: <16303.40898.410595.383833@grendel.zope.com>
> Content-Type: text/plain; charset=us-ascii
>
>
> Skip Montanaro writes:
>  > Don't forget 'punctuation'.  Maybe it should be 'ascii_punctuation',
since
>  > I'm sure there are other punctuation characters which would turn up in
>  > unicode.
>
> Ah, yes.
>
>  > If they are going to be attached to a class, why not to basestring?
>
> That makes sense for ascii_* and *digits, perhaps.  whitespace and
> punctuation definately change for Unicode, so it's less clear that the
> values belong in a base class.
>
>
>   -Fred
>
> -- 
> Fred L. Drake, Jr.  <fdrake at acm.org>
> PythonLabs at Zope Corporation
>
>
>
> ------------------------------
>
> Message: 10
> Date: Mon, 10 Nov 2003 07:34:53 -0800
> From: Guido van Rossum <guido@python.org>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: Michael Hudson <mwh@python.net>
> Cc: python-dev@python.org
> Message-ID: <200311101534.hAAFYrB15503@12-236-54-216.client.attbi.com>
>
> > > I would love it if what happened really was something like:
> > >
> > >>>> from socket import *
> > >>>> print AF_UNIX
> > > socket.AF_UNIX
> > >>>> from errno import *
> > >>>> print EEXIST
> > > errno.EEXIST
> >
> > I've had this idea too.  I like it, I think.  The signal module could
> > use it too...
>
> Yes, that would be cool for many enums.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> ------------------------------
>
> Message: 11
> Date: Mon, 10 Nov 2003 07:39:07 -0800
> From: Guido van Rossum <guido@python.org>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: Michael Hudson <mwh@python.net>
> Cc: python-dev@python.org
> Message-ID: <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com>
>
> > I think I prefer Guido's idea that when a function argument is almost
> > always constant you should really have two functions and /F's (?)
> > idea that there should be a 'textfile' function:
> >
> >     textfile(path[, mode='r'[, encoding='ascii']]) -> file object
> >
> > or similar.
>
> I'm not so sure about that in this case.  There are quite a few places
> where one writes a wrapper for open() that takes a mode and passes it
> on to the real open().  Having to distinguish between multiple open()
> functions would complexify this.
>
> OTOH my experimental standard I/O replacement (nondist/sandbox/sio)
> does a similar thing, by providing different constructors for
> different functionality (buffering, text translation, low-level I/O
> basis).
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> ------------------------------
>
> Message: 12
> Date: Mon, 10 Nov 2003 10:44:56 -0500 (EST)
> From: Dan Sugalski <dan@sidhe.org>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: "Fred L. Drake, Jr." <fdrake@acm.org>
> Cc: skip@pobox.com, python-dev@python.org
> Message-ID: <Pine.LNX.4.58.0311101043300.15737@sprite.sidhe.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> On Mon, 10 Nov 2003, Fred L. Drake, Jr. wrote:
>
> >
> > Skip Montanaro writes:
> >  > Don't forget 'punctuation'.  Maybe it should be 'ascii_punctuation',
since
> >  > I'm sure there are other punctuation characters which would turn up
in
> >  > unicode.
> >
> > Ah, yes.
> >
> >  > If they are going to be attached to a class, why not to basestring?
> >
> > That makes sense for ascii_* and *digits, perhaps.
>
> Digits change for Unicode as well. Plus they get potentially...
> interesting in some cases, where the digit-ness of a character is arguably
> contextually driven, but I think that can be ignored. Most of the time, at
> least.
>
> Dan
>
> --------------------------------------"it's like this"-------------------
> Dan Sugalski                          even samurai
> dan@sidhe.org                         have teddy bears and even
>                                       teddy bears get drunk
>
>
>
>
> ------------------------------
>
> Message: 13
> Date: Mon, 10 Nov 2003 15:56:01 +0000
> From: Michael Hudson <mwh@python.net>
> Subject: Re: [Python-Dev] other "magic strings" issues
> To: python-dev@python.org
> Message-ID: <2md6c0xlny.fsf@starship.python.net>
> Content-Type: text/plain; charset=us-ascii
>
> Guido van Rossum <guido@python.org> writes:
>
> >> I think I prefer Guido's idea that when a function argument is almost
> >> always constant you should really have two functions and /F's (?)
> >> idea that there should be a 'textfile' function:
> >>
> >>     textfile(path[, mode='r'[, encoding='ascii']]) -> file object
> >>
> >> or similar.
> >
> > I'm not so sure about that in this case.  There are quite a few places
> > where one writes a wrapper for open() that takes a mode and passes it
> > on to the real open().
>
> I may just be being thick today but I can't think of many.  Most of
> the time passing in an already on file object would be better
> interface, surely?  Well, there's things like the codec writers, but
> textfile would hopefully subsume them.
>
> > Having to distinguish between multiple open() functions would
> > complexify this.
> >
> > OTOH my experimental standard I/O replacement (nondist/sandbox/sio)
> > does a similar thing, by providing different constructors for
> > different functionality (buffering, text translation, low-level I/O
> > basis).
>
> Does text translation cover unicode issues here?
>
> Cheers,
> mwh
>
> -- 
>   Never meddle in the affairs of NT. It is slow to boot and quick to
>   crash.                                             -- Stephen Harris
>                -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html
>
>
>
> ------------------------------
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
>
>
> End of Python-Dev Digest, Vol 4, Issue 33
> *****************************************


From eppstein at ics.uci.edu  Mon Nov 10 20:48:06 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Mon Nov 10 20:48:10 2003
Subject: [Python-Dev] Re: other "magic strings" issues
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
Message-ID: <eppstein-5F2705.17480610112003@sea.gmane.org>

In article <200311102251.10904.aleaxit@yahoo.com>,
 Alex Martelli <aleaxit@yahoo.com> wrote:

> >>> map(string.upper, ('ciao', u'ciao'))
> ['CIAO', u'CIAO']
> 
> >>> map(str.upper, ('ciao', u'ciao'))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode'
> 
> >>> map(unicode.upper, ('ciao', u'ciao'))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str'
> 
> 
> To be honest I don't currently have any real use case that's quite like this 
> (i.e., based on a mix of string and unicode objects), but I DO have cases
> in completely different domains where I end up coding the like of (sigh...):

Actually I had exactly this case recently: I had an object that needed 
to store a pointer to a function for normalizing item names prior to 
looking them up in a dictionary, and most of the time (but not always) 
that function was lower().  But I wanted to handle both str and unicode, 
so I wrote a one-line function:

def lower(x): return x.lower()

> ...and we're back to wishing for a way to pass a nonlambda-callable.  E.g.
> a string-related example would be "order the strings in list lotsastrings 
> (which may be all plain strings, or all unicode strings, on different calls 
> of this overall function) in case-insensitive-alphabetical order".  In 2.4 
> _with_ the string module that's a snap:
> 
> lotsastrings.sort(key=string.upper)

Is that really alphabetical?  It seems like it orders them based on the 
ordinal value of the characters, which doesn't work so well for unicodes.
The last time I needed this I couldn't figure out how to get a 
reasonable case-insensitive-alphabetical order in pure python, so I used 
PyObjC's NSString.localizedCaseInsensitiveCompare_ instead; a pure 
Python solution that works as well as that one would be welcome.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From tim.one at comcast.net  Mon Nov 10 21:48:50 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Nov 10 21:48:55 2003
Subject: [Python-Dev] More fun with Python shutdown
Message-ID: <LNBBLJKPBEHFEDALKOLCOEKKHAAB.tim.one@comcast.net>

Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak
references.  Now Zope3 dies with a segfault when it's shut down, which makes
its adoption of Python 2.3.2 a bit less attractive <wink>.

The problem isn't really understood.  I hope that once it is, there will be
a simple way to avoid it under 2.3.2.  Jim filed a bug report with a fix to
the symptom here:

    http://www.python.org/sf/839548

It's another case where things go crazy during the second call of
PyGC_Collect in Py_Finalize.  Alas, we haven't found a simpler failing test
case than "Zope3" yet.

For bafflement value, I'll give a cmdline-parameterized snippet here that
displays at least 4 distinct behaviors at shutdown, although a segfault
isn't one of them:

"""
import weakref
import os

class C(object):
    def hi(self, w=os.write):
        w(1, 'hi 1\n')
        print 'hi 2'

def pp(c=C()):
    c.hi()

import sys
exec "import %s as somemodule" % sys.argv[1] in globals()
del sys

somemodule.c1 = C()
somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp: pp())

del C, pp
"""

Here are the ways it behaves (on Windows, anyway):

C:\Code\python\PCbuild>python temp4.py tempfile
hi 1
hi 2

C:\Code\python\PCbuild>python temp4.py math # curiously, __main__ the same

C:\Code\python\PCbuild>python temp4.py __builtin__
hi 1

C:\Code\python\PCbuild>python temp4.py sys
hi 1
Exception exceptions.AttributeError: "'NoneType' object has no attribute
    'write'" in <function <lambda> at 0x006B6C70> ignored

C:\Code\python\PCbuild>

The only one I can't make any sense of is __builtin__:  the weakref callback
is certainly invoked then, but its print statement neither produces output
nor raises an exception.

Note that the exception in the "sys" example has nothing to do with the
"os.write" default-arg value.  That's really the print statement,
complaining because sys.stdout is None by the time shutdown gets there.


From tismer at tismer.com  Mon Nov 10 21:52:16 2003
From: tismer at tismer.com (Christian Tismer)
Date: Mon Nov 10 21:52:18 2003
Subject: [Python-Dev] Making python C-API thread safe (try 2)
In-Reply-To: <ipicnag7OpxdLfyiRTvUpQ@speakeasy.net>
References: <5.1.1.6.0.20030911142317.02b88640@telecommunity.com>
	<5.1.1.6.0.20030911130607.02426ec0@telecommunity.com>
	<5.1.1.6.0.20030911130607.02426ec0@telecommunity.com>
	<5.1.1.6.0.20030911142317.02b88640@telecommunity.com>
	<5.1.1.6.0.20030911162016.02027750@telecommunity.com>
	<mailman.1063342640.29560.python-list@python.org>
	<ipicnag7OpxdLfyiRTvUpQ@speakeasy.net>
Message-ID: <3FB04EE0.50901@tismer.com>

Not having read c.l.py for too long, some comments, anyway...

A.M. Kuchling wrote:

> On Fri, 12 Sep 2003 07:56:55 +0300, 
> 	Harri Pesonen <fuerte@sci.fi> wrote:
> 
>>I don't know, I got mail about writing a PEP. It is clear that it would 
>>not be accepted, because it would break the existing API. The change is 
>>so big that I think that it has to be called a different language.
> 
> 
> It would just be a different implementation of the same language.  Jython
> has different garbage collection characteristics from CPython, but they
> still implement the same language; Stackless Python is still Python.

This is only half the truth. Of course, you can run all of your
code in Stackless without change. But as soon as you have become
familiar  with it, your programming sty changes so drastically,
that you never will want to go back.
I realized this late, after my first "eat your own dogfood"
project. Stackless dramatically simplifies your coding style.
This seems to be an irreversible process.
I will provide examples.

>>because this is too important to be ignored. Python *needs* to be 
>>free-threading...

While written so heartily, and I can understand this very much,
it appears to be very, very wrong, since it does not address
general needs.
I admit: TZhere are situations where you need this, and you would
easily pay the extra of an at least 20-30 % overhead for being
free-threaded. But this isn't common-case.

Python's model of object sharing enforces such a costly scheme
at the moment. In most cases, I strongly believe that this is
not necessary, basically. The fact that access to almost any
Python object is possible at almost any time is not a feature,
but an artifact. Having to protect any mutable object at any
time is a consequence of this. This protection currently either
has to be the GIL, or builtin protection for the objects.

I guess, that in most cases, you would want to have almost
completely disjoint object spaces without any implicit sharing
of mutables. You would provide extra communication primitives
in order to share certain objects, instead. This way, most of
the free threading issues would vanish, in favor of a limited
set of controlled, shared objects, while most of the rest would
just run unrestricted.

Playing with such derivatives will be one of the strengths of
the PyPy project, which has ability to try alternatives as one
of its major goals. In CPython, you currently don't have much
more alternatives than to run disjoint processes, which are
communicating by exchanging pickled objects.
(Which is, IMHO, not the worst solution at all!)

> On the other hand, considering that the last free threading packages were
> for 1.4, and no one has bothered to update them, the community doesn't seem
> to find the subject as important as you do. :)

I think it is worthwile to be considered as a special alternative.
Making it *the* requirement is surely not the right goal for a tool
that has to fulfil everybody's needs.

Mysuggestion is to add this as a feature request to PyPy, together
with some effort supporting it. If PyPy is going to be as flexible
as we claimed many times, then it should be possible to derive
a version with the desired properties. But this is meant to be
a challenge for Harri, for instance.

all the best -- chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From greg at cosc.canterbury.ac.nz  Mon Nov 10 23:01:51 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 10 23:02:33 2003
Subject: [Python-Dev] Deprecating obsolete builtins
In-Reply-To: <20031106035837.GB7212@epoch.metaslash.com>
Message-ID: <200311110401.hAB41pd17180@oma.cosc.canterbury.ac.nz>

Neal Norwitz <neal@metaslash.com>:

> For the most part, I meant to remove them (including intern)
> altogether in the long run.  In 2.4, I only meant to officially
> deprecate them with a warning.  intern() doesn't seem particularly
> useful or commonly used.

If the implementation of string comparison is somehow changed so that
explicit interning is no longer necessary for efficient lookup of
dynamically-constructed names, then intern() can go.

But until then, the functionality needs to be available somehow -- you
might not need it often, but when you do, there's no substitute for
it.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From vladimir.marangozov at optimay.com  Tue Nov 11 05:13:50 2003
From: vladimir.marangozov at optimay.com (Marangozov, Vladimir (Vladimir))
Date: Tue Nov 11 05:14:15 2003
Subject: [Python-Dev] Re: other "magic strings" issues
Message-ID: <6CC39F01DF9C56438FC6B7473A989B63055C18@geex2ku01.agere.com>


Hi,

> [me]
> > Put it another way, it's good to have all string functions being
> > attributes to a single well-known object, that object being the
> > 'string' module, instead of spreading it all over...  So add the

[Alex]
> Not sure anybody wants to "spread it all over", for whatever "it".
> str.whatever should be usable where string.whatever is usable
> now, so, what would the problem be...?

"Should" is a bit of an overstatement, provided that Python lived
happily without all string functions as attributes for 10+ years.
Now you've grown OO and appreciate having all functions as attributes
and that's fine.  Noone has objected to enlarge the set of attributes.

The objection is towards deprecating the 'string' module thus closing
the door for a procedural approach to strings.  And if I say that 5%
of all programmers don't care about string polymorphism nor Unicode,
that is probably true as well, so no point in arguing that o.upper()
is better than string.upper(o).

o.upper() is really StringType.upper(o) under the hood, which is the
same as import string / string.upper(o).  Both StringType and 'string'
act as function packages (containers).  Yes, I see you coming with
arguments that they aren't really the same because of subtleties like
Unicode, etc. but that's irrelevant for those 95% of the people who
aren't heavily invested in strings and simply don't care.

The catch is that if we favor the OO approach and deprecate 'string',
we deprecate one explicit way of spelling things, which is import
string / string.upper(o).  This has been adopted and is widely used.

Python has always tried to balance purity with practicality and OO
in Python is still perceived as optional, especially for the newcomer
who needs to write a couple of quick scripts to get the job done.
I am not sure we have to favor the OO reasoning for everything.

There are also backward compatibility issues arising from deprecating
'string' but I belive this is manageable.  'string' can be aliased to
StringType so that it is backwards compatible.  Removing the 'string'
module as a name completely would be a bit of a challenge though...

Cheers,
Vladimir

From jim at zope.com  Tue Nov 11 05:32:01 2003
From: jim at zope.com (Jim Fulton)
Date: Tue Nov 11 05:33:29 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEKKHAAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCOEKKHAAB.tim.one@comcast.net>
Message-ID: <3FB0BAA1.5040607@zope.com>

Tim Peters wrote:
 > Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak
 > references.  Now Zope3 dies with a segfault when it's shut down, which makes
 > its adoption of Python 2.3.2 a bit less attractive <wink>.

My main concern at this point is getting a 2.3.3 that doesn't have this
behavior. In the worst case, I think I could create a version of weak dicts
that avoided the symptom, by avoiding attribute accesses in weakref callbacks.

> The problem isn't really understood.  I hope that once it is, there will be
> a simple way to avoid it under 2.3.2.  Jim filed a bug report with a fix to
> the symptom here:
> 
>     http://www.python.org/sf/839548

The theory is that it occurs when a cycle involving a class is broken by
calling the tp_clear slot on a heap type.  I verified this by setting
a gdb break point in Zope 3 and verifying that type_clear was called while a
type still had a ref count much higher than 1.

 From a purely theoretical point of view, the current behavior is wrong.
There is clearly an invariant that tp_mro is not None and type_clear violates
this. The fix (setting the mro to () in type_clear, is pretty straightforward.

My assumption is that it's possible for this to occur at times other than
shutdown, although, perhaps, wildly unlikely.

What's especially poorly understood is how to make it happen in a smallter
test program.


> It's another case where things go crazy during the second call of
> PyGC_Collect in Py_Finalize.  Alas, we haven't found a simpler failing test
> case than "Zope3" yet.
> 
> For bafflement value, I'll give a cmdline-parameterized snippet here that
> displays at least 4 distinct behaviors at shutdown, although a segfault
> isn't one of them:

BTW, with a debug build, I get an assertion error rather than a segfault.


> """
> import weakref
> import os
> 
> class C(object):
>     def hi(self, w=os.write):
>         w(1, 'hi 1\n')
>         print 'hi 2'
> 
> def pp(c=C()):
>     c.hi()
> 
> import sys
> exec "import %s as somemodule" % sys.argv[1] in globals()
> del sys
> 
> somemodule.c1 = C()
> somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp: pp())
> 
> del C, pp
> """
> 
> Here are the ways it behaves (on Windows, anyway):
> 
> C:\Code\python\PCbuild>python temp4.py tempfile
> hi 1
> hi 2
> 
> C:\Code\python\PCbuild>python temp4.py math # curiously, __main__ the same
> 
> C:\Code\python\PCbuild>python temp4.py __builtin__
> hi 1
> 
> C:\Code\python\PCbuild>python temp4.py sys
> hi 1
> Exception exceptions.AttributeError: "'NoneType' object has no attribute
>     'write'" in <function <lambda> at 0x006B6C70> ignored
> 
> C:\Code\python\PCbuild>
> 
> The only one I can't make any sense of is __builtin__:  the weakref callback
> is certainly invoked then, but its print statement neither produces output
> nor raises an exception.

When trying to debug this in Zope 3, I similarly noticed that prints in the
weakref callback produced no output.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From mwh at python.net  Tue Nov 11 06:40:02 2003
From: mwh at python.net (Michael Hudson)
Date: Tue Nov 11 06:40:06 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <eppstein-5F2705.17480610112003@sea.gmane.org> (David
	Eppstein's message of "Mon, 10 Nov 2003 17:48:06 -0800")
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
Message-ID: <2mhe1buoa5.fsf@starship.python.net>

David Eppstein <eppstein@ics.uci.edu> writes:

>> ...and we're back to wishing for a way to pass a nonlambda-callable.  E.g.
>> a string-related example would be "order the strings in list lotsastrings 
>> (which may be all plain strings, or all unicode strings, on different calls 
>> of this overall function) in case-insensitive-alphabetical order".  In 2.4 
>> _with_ the string module that's a snap:
>> 
>> lotsastrings.sort(key=string.upper)
>
> Is that really alphabetical?  It seems like it orders them based on the 
> ordinal value of the characters, which doesn't work so well for unicodes.
> The last time I needed this I couldn't figure out how to get a 
> reasonable case-insensitive-alphabetical order in pure python, so I used 
> PyObjC's NSString.localizedCaseInsensitiveCompare_ instead; a pure 
> Python solution that works as well as that one would be welcome.

The locale module has some things in this direction -- strxfrm and
strcoll, maybe? -- but I don't know what they do with unicode & doubt
they even exist on OS X.

Cheers,
mwh

-- 
  Do I do everything in C++ and teach a course in advanced swearing?
         -- David Beazley at IPC8, on choosing a language for teaching

From ark at acm.org  Tue Nov 11 10:48:15 2003
From: ark at acm.org (Andrew Koenig)
Date: Tue Nov 11 10:48:22 2003
Subject: [Python-Dev] question about PEP 323 (copyable iterators)
Message-ID: <004601c3a86b$38b8fb80$6402a8c0@arkdesktop>

Early in PEP 323, there is a claim that an iterator is considered copyable
if it has a __copy__ method.  The following example in the PEP illustrates
that claim:

        def tee(it):
            it = iter(it)
            try: copier = it.__copy__
            except AttributeError:
                # non-copyable iterator, do all the needed hard work
                # [snipped!]
            else:
                return it, copier()

Later in the PEP, there is an example that suggests that an iterator should
be considered copyable only if its __copy__ method can be called:

        class enumerate(object):

            def __init__(self, it):
                self.it = iter(it)
                self.i = -1

		# next and __iter__ methods snipped from the original

            def __copy__(self):
                result = self.__class__.new()
                result.it = self.it.__copy__()
                result.i = self.i
                return result

Here, class enumerate always has a __copy__ method, even if the iterator
that is being enumerated doesn't.  In other words, if you use class
enumerate on an iterator that isn't copyable, you get an iterator with a
__copy__ method that isn't copyable.

Is that behavior really right?  I would think that you would have to do
something like this:

        class enumerate(object):

            def __init__(self, it):
                self.it = iter(it)
                self.i = -1
		    try it.__copy__
		    except AttributeError: pass
		    else: self.__copy__ = self.conditional_copy

            def conditional_copy(self):
                result = self.__class__.new()
                result.it = self.it.__copy__()
                result.i = self.i
                return result

Am I missing something?


From tim at zope.com  Tue Nov 11 12:07:20 2003
From: tim at zope.com (Tim Peters)
Date: Tue Nov 11 12:08:26 2003
Subject: [Python-Dev] RE: More fun with Python shutdown
In-Reply-To: <3FB0BAA1.5040607@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>

[Jim Fulton, on <http://www.python.org/sf/839548>]
> ...
> The theory is that it occurs when a cycle involving a class is broken
> by calling the tp_clear slot on a heap type.  I verified this by
> setting a gdb break point in Zope 3 and verifying that type_clear was
> called while a type still had a ref count much higher than 1.
>
> From a purely theoretical point of view, the current behavior is
> wrong.

It is, but a segfault is more than just pure theory <wink>.

> There is clearly an invariant that tp_mro is not None and
> type_clear violates this.  The fix (setting the mro to () in
> type_clear, is pretty straightforward.

The invariant is that tp_mro is not NULL so long as anyone may reference it.
tp_clear believes that tp_mro will never be referenced again, but it's
demonstrably wrong in that belief.  The real bug lies there:  why is its
belief wrong?

You patched it so that tp_mro doesn't become NULL, thus avoiding the
immediate segfault, but until we understand *why* the invariant got
violated, it's unclear that the patch is "a fix".  Code is still accessing
the MRO after tp_clear is called, but now instead of a segfault it's going
to see an empty MRO.  That's also (and clearly so, at least to me)
incorrect:  code that tries to access a class's MRO should see the MRO the
programmer intended, and no sane class has an empty tuple for its MRO.  So I
think the "tp_mro <- ()" patch exchanges gross breakage for subtler
breakage.

> My assumption is that it's possible for this to occur at times other
> than shutdown, although, perhaps, wildly unlikely.

In the absence of real understanding, who knows.  If it is possible before
shutdown, then the importance of not exposing user code to a made-up MRO
skyrockets, IMO.

> What's especially poorly understood is how to make it happen in a
> smallter test program.

> ...
> BTW, with a debug build, I get an assertion error rather than a
> segfault.

Which assertion fails then?  That may be a good clue toward truly
understanding what's causing this.


>> """
>> import weakref
>> import os
>>
>> class C(object):
>>     def hi(self, w=os.write):
>>         w(1, 'hi 1\n')
>>         print 'hi 2'
>>
>> def pp(c=C()):
>>     c.hi()
>>
>> import sys
>> exec "import %s as somemodule" % sys.argv[1] in globals() del sys
>>
>> somemodule.c1 = C()
>> somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp:
>> pp())
>>
>> del C, pp
>> """

...

>> C:\Code\python\PCbuild>python temp4.py __builtin__
>> hi 1

...

>> The only one I can't make any sense of is __builtin__:  the weakref
>> callback is certainly invoked then, but its print statement neither
>> produces output nor raises an exception.


> When trying to debug this in Zope 3, I similarly noticed that prints
> in the weakref callback produced no output.

I'm not sure this one's worth pursuing.  Your problem occurred during the
second call to gc in finalization, and the sys module has been gutted by
that point.  In particular, sys.stdout has been cleared, so a print
statement can't work then.  The only mystery to me wrt this is why it didn't
raise an exception, like the

>> Exception exceptions.AttributeError: "'NoneType' object has no attribute
>>     'write'" in <function <lambda> at 0x006B6C70> ignored

raised when calling that little program with "sys" instead of "__builtin__".


From guido at python.org  Tue Nov 11 12:13:19 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 11 12:13:41 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: Your message of "Tue, 11 Nov 2003 11:40:02 GMT."
	<2mhe1buoa5.fsf@starship.python.net> 
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org> 
	<2mhe1buoa5.fsf@starship.python.net> 
Message-ID: <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>

> The locale module has some things in this direction -- strxfrm and
> strcoll, maybe? -- but I don't know what they do with unicode & doubt
> they even exist on OS X.

IMO, locale and Unicode shouldn't be mentioned in the same sentence.
At least the part of the locale that defines properties of characters
is subsumed in Unicode in a way that doesn't require you to specify
the locale.  (Of course the locale is still important in defining
things like conventions for formatting numbers and dates.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aleaxit at yahoo.com  Tue Nov 11 12:21:53 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Tue Nov 11 12:22:00 2003
Subject: [Python-Dev] question about PEP 323 (copyable iterators)
In-Reply-To: <004601c3a86b$38b8fb80$6402a8c0@arkdesktop>
References: <004601c3a86b$38b8fb80$6402a8c0@arkdesktop>
Message-ID: <200311111821.53479.aleaxit@yahoo.com>

On Tuesday 11 November 2003 04:48 pm, Andrew Koenig wrote:
> Early in PEP 323, there is a claim that an iterator is considered copyable
> if it has a __copy__ method.  The following example in the PEP illustrates
> that claim:
>
>         def tee(it):
>             it = iter(it)
>             try: copier = it.__copy__
>             except AttributeError:
>                 # non-copyable iterator, do all the needed hard work
>                 # [snipped!]
>             else:
>                 return it, copier()
>
> Later in the PEP, there is an example that suggests that an iterator should
> be considered copyable only if its __copy__ method can be called:

Very good point -- thanks!

> Here, class enumerate always has a __copy__ method, even if the iterator
> that is being enumerated doesn't.  In other words, if you use class
> enumerate on an iterator that isn't copyable, you get an iterator with a
> __copy__ method that isn't copyable.

Right.

> Is that behavior really right?  I would think that you would have to do
> something like this:

Special methods are normally defined on the type, not on the instance.

So, a per-instance conditional definition of __copy__ does not appear
to be right.  Rather, I think I should rework the above example as:

         def tee(it):
             it = iter(it)
             try: return it, it.__copy__()
             except (AttributeError, TypeError):
                 # non-copyable iterator, do all the needed hard work
                 # [snipped!]

i.e., an iterator is copyable if it has a __copy__ method that can be
called without arguments and won't raise AttributeError or TypeError
(other exceptions are not expected and would therefore propagate).
This will allow "wrappers" such as enumerate to do their job most
simply.  (We could allow only TypeError and not AttributeError, but
that would complicate both suppliers of __copy__ such as enumerate
and consumers of it such as tee).


Alex


From bh at intevation.de  Tue Nov 11 12:22:00 2003
From: bh at intevation.de (Bernhard Herzog)
Date: Tue Nov 11 12:22:12 2003
Subject: [Python-Dev] RE: More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com> (Tim Peters's
	message of "Tue, 11 Nov 2003 12:07:20 -0500")
References: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
Message-ID: <6qad72ddmv.fsf@salmakis.intevation.de>

"Tim Peters" <tim@zope.com> writes:

>> When trying to debug this in Zope 3, I similarly noticed that prints
>> in the weakref callback produced no output.
>
> I'm not sure this one's worth pursuing.  Your problem occurred during the
> second call to gc in finalization, and the sys module has been gutted by
> that point.  In particular, sys.stdout has been cleared, so a print
> statement can't work then.  The only mystery to me wrt this is why it didn't
> raise an exception, like the
>
>>> Exception exceptions.AttributeError: "'NoneType' object has no attribute
>>>     'write'" in <function <lambda> at 0x006B6C70> ignored
>
> raised when calling that little program with "sys" instead of "__builtin__".

Perhaps because sys.stderr has also been cleared?

Python 2.3.2 (#2, Oct  6 2003, 19:39:48) 
[GCC 3.3.2 20030908 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class C(object):
...     def __del__(self):
...             print "__del__"
... 
>>> import sys
>>> sys.stdout = None
>>> c = C()
>>> del c
Exception exceptions.AttributeError: "'NoneType' object has no attribute 'write'" in <bound method C.__del__ of <__main__.C object at 0x30074fb0>> ignored
>>> sys.stderr = None
>>> c = C()
>>> del c
>>> 


   Bernhard

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
Thuban                                  http://thuban.intevation.org/

From jim at zope.com  Tue Nov 11 12:25:19 2003
From: jim at zope.com (Jim Fulton)
Date: Tue Nov 11 12:26:20 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
Message-ID: <3FB11B7F.4040407@zope.com>

Tim Peters wrote:
> [Jim Fulton, on <http://www.python.org/sf/839548>]
> 
>>...
>>The theory is that it occurs when a cycle involving a class is broken
>>by calling the tp_clear slot on a heap type.  I verified this by
>>setting a gdb break point in Zope 3 and verifying that type_clear was
>>called while a type still had a ref count much higher than 1.
>>
>>From a purely theoretical point of view, the current behavior is
>>wrong.
> 
> 
> It is, but a segfault is more than just pure theory <wink>.

I don't know what your point is here.

> 
>>There is clearly an invariant that tp_mro is not None and
>>type_clear violates this.  The fix (setting the mro to () in
>>type_clear, is pretty straightforward.
> 
> 
> The invariant is that tp_mro is not NULL so long as anyone may reference it.
> tp_clear believes that tp_mro will never be referenced again, but it's
> demonstrably wrong in that belief.  The real bug lies there:  why is its
> belief wrong?

I thought that tp_clear was called to break cycles. Surely, if a class is
in a cycle, there are references to it. Why would one assume that none
of these references are instances?

> You patched it so that tp_mro doesn't become NULL, thus avoiding the
> immediate segfault, but until we understand *why* the invariant got
> violated, it's unclear that the patch is "a fix".  Code is still accessing
> the MRO after tp_clear is called, but now instead of a segfault it's going
> to see an empty MRO.  That's also (and clearly so, at least to me)
> incorrect:  code that tries to access a class's MRO should see the MRO the
> programmer intended, and no sane class has an empty tuple for its MRO.  So I
> think the "tp_mro <- ()" patch exchanges gross breakage for subtler
> breakage.

Surely, the original intent is top break something. ;)
I'd much rather get an attribute error than a segfault or an
equally fatal C assertion error.


>>BTW, with a debug build, I get an assertion error rather than a
>>segfault.
> 
> 
> Which assertion fails then?  That may be a good clue toward truly
> understanding what's causing this.

The assertion that mro is not NULL. :)

See PyObject_GenericGetAttr.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From tim.one at comcast.net  Tue Nov 11 12:32:23 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Nov 11 12:32:28 2003
Subject: [Python-Dev] RE: More fun with Python shutdown
In-Reply-To: <6qad72ddmv.fsf@salmakis.intevation.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAENMHAAB.tim.one@comcast.net>

[Bernhard Herzog]
> Perhaps because sys.stderr has also been cleared?

Sounds good to me.  Now go back and figure out the real problem <wink>.

From pje at telecommunity.com  Tue Nov 11 12:47:42 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Nov 11 12:50:03 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <3FB11B7F.4040407@zope.com>
References: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
	<LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
Message-ID: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com>

At 12:25 PM 11/11/03 -0500, Jim Fulton wrote:
>Tim Peters wrote:
>>[Jim Fulton, on <http://www.python.org/sf/839548>]
>>
>>>...
>>>The theory is that it occurs when a cycle involving a class is broken
>>>by calling the tp_clear slot on a heap type.  I verified this by
>>>setting a gdb break point in Zope 3 and verifying that type_clear was
>>>called while a type still had a ref count much higher than 1.
>> From a purely theoretical point of view, the current behavior is
>>>wrong.
>>
>>It is, but a segfault is more than just pure theory <wink>.
>
>I don't know what your point is here.

It's a joke, laugh.  :)


>>>There is clearly an invariant that tp_mro is not None and
>>>type_clear violates this.  The fix (setting the mro to () in
>>>type_clear, is pretty straightforward.
>>
>>The invariant is that tp_mro is not NULL so long as anyone may reference it.
>>tp_clear believes that tp_mro will never be referenced again, but it's
>>demonstrably wrong in that belief.  The real bug lies there:  why is its
>>belief wrong?
>
>I thought that tp_clear was called to break cycles. Surely, if a class is
>in a cycle, there are references to it. Why would one assume that none
>of these references are instances?

Actually, the funny thing here is that it's unlikely that the cycle a type 
is in involves its base classes.  The only way I know of in pure Python to 
have such a cycle is to set an attribute of the base class to refer to the 
subclass, which means that clearing each type's dictionary (and other 
metaclass-defined slots, if any) should be sufficient to break the cycle, 
without touching tp_mro.


>>You patched it so that tp_mro doesn't become NULL, thus avoiding the
>>immediate segfault, but until we understand *why* the invariant got
>>violated, it's unclear that the patch is "a fix".  Code is still accessing
>>the MRO after tp_clear is called, but now instead of a segfault it's going
>>to see an empty MRO.  That's also (and clearly so, at least to me)
>>incorrect:  code that tries to access a class's MRO should see the MRO the
>>programmer intended, and no sane class has an empty tuple for its MRO.  So I
>>think the "tp_mro <- ()" patch exchanges gross breakage for subtler
>>breakage.
>
>Surely, the original intent is top break something. ;)
>I'd much rather get an attribute error than a segfault or an
>equally fatal C assertion error.

What's baffling me is what code is accessing the class after tp_clear is 
called.  It can't be a __del__ method, or the cycle collector wouldn't be 
calling tp_clear, right?  Or does it run __del__ methods during shutdown?


From tim at zope.com  Tue Nov 11 13:01:52 2003
From: tim at zope.com (Tim Peters)
Date: Tue Nov 11 13:03:46 2003
Subject: [Python-Dev] RE: More fun with Python shutdown
In-Reply-To: <3FB11B7F.4040407@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEOAHAAB.tim@zope.com>

[Jim]
>>> From a purely theoretical point of view, the current behavior is
>>> wrong.

[Tim]
>> It is, but a segfault is more than just pure theory <wink>.

[Jim]
> I don't know what your point is here.

I didn't know what you were trying to communicate by "From a purely
theoretical point of view".  That's all.  A segault isn't a theoretical nit,
it's a serious bug.  Your phrasing appeared to imply that it wasn't a
serious bug ("wrong" is synonymous with "bug" to me here).

> ...
> I thought that tp_clear was called to break cycles.

Yes.

> Surely, if a class is in a cycle, there are references to it.

Yes.

> Why would one assume that none of these references are instances?

I don't think anyone is assuming that.  The assumption is that nobody will
*access* the class's MRO slot again.  That's not the same as assuming there
are no instances.  It may be in part be a bad assumption that dead instances
can't execute any methods ever again, fed by that gc refuses to break cycles
if an object in the cycle contains a __del__ method.  If weakrefs supply
another path for executing from the grave, then the problem is deeper than
the patch addresses.

> ...
> Surely, the original intent is top break something. ;)
> I'd much rather get an attribute error than a segfault or an
> equally fatal C assertion error.

My goal on Python-Dev isn't just to stop Zope3 from segfaulting, feeding it
mysterious AttributeErrors instead.  That may be good enough for your
current purposes, but it leaves the language in a still-sickly state.

For example, I've suggested here before that the second call of gc from
finalization may be a bad idea in general, because the interpreter is in a
damaged (largely torn-down) state at that time.  That would address a larger
class of shutdown problems, and Zope isn't unique in seeing new shutdown
problems under 2.3.2 (there have been other reports on c.l.py, but so far
only of the "weird information-free msgs from threads at shutdown" flavor
that we first saw in the Zope3 test suite, before cleaning up the stale
threads).

But we don't understand *this* problem well enough yet, and you raised the
real possibility that this one can bite before shutdown.  In that case a
robust fix necessarily costs more than just commenting out the second gc
call (which, all by itself, would have been enough to stop your segfaults so
far too).

>> Which assertion fails then?  That may be a good clue toward truly
>> understanding what's causing this.

> The assertion that mro is not NULL. :)

LOL -- that shed a lot of light <wink>.


From eppstein at ics.uci.edu  Tue Nov 11 13:09:05 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Tue Nov 11 13:09:08 2003
Subject: [Python-Dev] Re: other "magic strings" issues
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
Message-ID: <eppstein-CF3FD2.10090511112003@sea.gmane.org>

In article <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>,
 Guido van Rossum <guido@python.org> wrote:

> > The locale module has some things in this direction -- strxfrm and
> > strcoll, maybe? -- but I don't know what they do with unicode & doubt
> > they even exist on OS X.
> 
> IMO, locale and Unicode shouldn't be mentioned in the same sentence.
> At least the part of the locale that defines properties of characters
> is subsumed in Unicode in a way that doesn't require you to specify
> the locale.  (Of course the locale is still important in defining
> things like conventions for formatting numbers and dates.)

The locale (as a concept) is also important in determining a unicode 
collation ordering, but it sounds like locale (as a Python module) 
doesn't do that.

Ok, it sounds like I am stuck with PyObjC's 
NSString.localizedCaseInsensitiveCompare_, since Python's built-in 
cmp(unicode,unicode) sucks and locale doesn't provide an alternative.
Are there any plans to add better collation ordering for unicode in 
future Python versions?

Googling finds statements like
http://mail.python.org/pipermail/i18n-sig/2001-May/000929.html
(over two years ago, saying this has been on the plate for some time 
already then) but not much recent.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From jim at zope.com  Tue Nov 11 13:42:13 2003
From: jim at zope.com (Jim Fulton)
Date: Tue Nov 11 13:43:12 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com>
References: <LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
	<LNBBLJKPBEHFEDALKOLCOENJHAAB.tim@zope.com>
	<5.1.1.6.0.20031111124123.02f48b90@telecommunity.com>
Message-ID: <3FB12D85.8040005@zope.com>

Phillip J. Eby wrote:
> At 12:25 PM 11/11/03 -0500, Jim Fulton wrote:
> 
>> Tim Peters wrote:
>>

...

>> Surely, the original intent is top break something. ;)
>> I'd much rather get an attribute error than a segfault or an
>> equally fatal C assertion error.
> 
> 
> What's baffling me is what code is accessing the class after tp_clear is 
> called.  It can't be a __del__ method, or the cycle collector wouldn't 
> be calling tp_clear, right?  Or does it run __del__ methods during 
> shutdown?

No, it's not a del. An object is being accessed in a weakref callback.
The object being accessed is *not* the obect being accessed by the weakref.
It's an object that had a dictionary that contained the wekref:

     class SurrogateRegistry(object):
         """Surrogate registry
         """

         def __init__(self):
             self._surrogates = {}

             def _remove(k, selfref=weakref.ref(self)):
                 self = selfref()
                 if self is not None:
                     try:
                         del self._surrogates[k]
                     except KeyError:
                         pass
             self._remove = _remove

This thing is similar to a WeakKeyDictionary. The _remove function is used as a
callback when creating weakrefs of things stored as keys in the _surrogates
dictionary.

Now, it turns out that this function is called at a point where tp_clear has been
called on the class.  The problem occurs when the callback tries to do self._surrogates.

(BTW, my workaround is:

     class SurrogateRegistry(object):
         """Surrogate registry
         """

         def __init__(self):
             self._surrogates = surrogates = {}

             def _remove(k):
                 try:
                     del surrogates[k]
                 except KeyError:
                     pass

             self._remove = _remove

  which avoids accessing "self", but creates a strong reference, and
  this a cycle, from the weakref objects to the _surrogates dict,
  which is acceptable for my needs.)

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From martin at v.loewis.de  Tue Nov 11 14:25:53 2003
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue Nov 11 14:26:08 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>	<200311102251.10904.aleaxit@yahoo.com>	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
Message-ID: <3FB137C1.9000903@v.loewis.de>

Guido van Rossum wrote:
>>The locale module has some things in this direction -- strxfrm and
>>strcoll, maybe? -- but I don't know what they do with unicode & doubt
>>they even exist on OS X.
> 
> 
> IMO, locale and Unicode shouldn't be mentioned in the same sentence.
> At least the part of the locale that defines properties of characters
> is subsumed in Unicode in a way that doesn't require you to specify
> the locale.  (Of course the locale is still important in defining
> things like conventions for formatting numbers and dates.)

In particular, locale also matters for collation. So the desire to
collate Unicode strings properly is reasonable, but you need to know
what locale to use for collation. With Python's current locale model,
one would convert the Unicode string to the locale's encoding, and
then perform collation.

Of course, with an ICU wrapper, you could have multiple simultaneous
locales, and collate Unicode strings without converting them into byte
strings first.

http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/

Regards,
Martin


From guido at python.org  Tue Nov 11 14:56:35 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 11 14:56:50 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: Your message of "Tue, 11 Nov 2003 20:25:53 +0100."
	<3FB137C1.9000903@v.loewis.de> 
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> 
	<3FB137C1.9000903@v.loewis.de> 
Message-ID: <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com>

> >>The locale module has some things in this direction -- strxfrm and
> >>strcoll, maybe? -- but I don't know what they do with unicode & doubt
> >>they even exist on OS X.
> > 
> Guido van Rossum wrote:
> > IMO, locale and Unicode shouldn't be mentioned in the same sentence.
> > At least the part of the locale that defines properties of characters
> > is subsumed in Unicode in a way that doesn't require you to specify
> > the locale.  (Of course the locale is still important in defining
> > things like conventions for formatting numbers and dates.)

[MvL]
> In particular, locale also matters for collation. So the desire to
> collate Unicode strings properly is reasonable, but you need to know
> what locale to use for collation. With Python's current locale model,
> one would convert the Unicode string to the locale's encoding, and
> then perform collation.

Ouch.  Seems you're right.

> Of course, with an ICU wrapper, you could have multiple simultaneous
> locales, and collate Unicode strings without converting them into byte
> strings first.
> 
> http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/

Is that something we could move into the std lib?

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Nov 11 15:05:54 2003
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue Nov 11 15:06:21 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<3FB137C1.9000903@v.loewis.de>
	<200311111956.hABJuZh18034@12-236-54-216.client.attbi.com>
Message-ID: <3FB14122.708@v.loewis.de>

Guido van Rossum wrote:

>>http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/
> 
> 
> Is that something we could move into the std lib?

It's incomplete. When it is completed, yes, perhaps. However,
ICU itself is *really* large (including the Unicode character
database, encoding tables for all encodings of the world, and
locale data for all languages), so we would need to ship that
as well, or require that it is pre-existing on a system (possible
for Linux, unrealistic for Windows).

More realistically, we could expose wcscoll(3) where available,
which would extend the Python locale model to Unicode (assuming
the C library uses Unicode in wchar_t).

Regards,
Martin


From guido at python.org  Tue Nov 11 15:09:02 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 11 15:09:11 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: Your message of "Tue, 11 Nov 2003 21:05:54 +0100."
	<3FB14122.708@v.loewis.de> 
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<3FB137C1.9000903@v.loewis.de>
	<200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> 
	<3FB14122.708@v.loewis.de> 
Message-ID: <200311112009.hABK92M18120@12-236-54-216.client.attbi.com>

> >>http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/
> > 
> > Is that something we could move into the std lib?
> 
> It's incomplete. When it is completed, yes, perhaps. However,
> ICU itself is *really* large (including the Unicode character
> database, encoding tables for all encodings of the world, and
> locale data for all languages), so we would need to ship that
> as well, or require that it is pre-existing on a system (possible
> for Linux, unrealistic for Windows).

How big would ICU binaries for Windows be?  I don't mind bloating the
Windows installer by a few MB.  As long as it doesn't have to land in
CVS...

> More realistically, we could expose wcscoll(3) where available,
> which would extend the Python locale model to Unicode (assuming
> the C library uses Unicode in wchar_t).

I don't know what that is, but if you recommend it, I support it.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Nov 11 15:21:40 2003
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue Nov 11 15:21:54 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <200311112009.hABK92M18120@12-236-54-216.client.attbi.com>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<3FB137C1.9000903@v.loewis.de>
	<200311111956.hABJuZh18034@12-236-54-216.client.attbi.com>
	<3FB14122.708@v.loewis.de>
	<200311112009.hABK92M18120@12-236-54-216.client.attbi.com>
Message-ID: <3FB144D4.8060307@v.loewis.de>

Guido van Rossum wrote:

> How big would ICU binaries for Windows be?  I don't mind bloating the
> Windows installer by a few MB.  As long as it doesn't have to land in
> CVS...

See

ftp://www-126.ibm.com/pub/icu/2.6.1/icu-2.6.1.zip

I haven't actually downloaded it because of size (9MB); the zip file
may contain header files and the like which we shouldn't ship.

>>More realistically, we could expose wcscoll(3) where available,
[...]
> I don't know what that is, but if you recommend it, I support it.

See

http://www.opengroup.org/onlinepubs/007908799/xsh/wcscoll.html

It goes along with wcsxfrm and wcscmp for efficient collation,
and parallels strcoll, strxfrm, and strcmp for wchar_t.

Regards,
Martin


From theller at python.net  Tue Nov 11 15:22:43 2003
From: theller at python.net (Thomas Heller)
Date: Tue Nov 11 15:22:54 2003
Subject: [Python-Dev] More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEKKHAAB.tim.one@comcast.net> (Tim
	Peters's message of "Mon, 10 Nov 2003 21:48:50 -0500")
References: <LNBBLJKPBEHFEDALKOLCOEKKHAAB.tim.one@comcast.net>
Message-ID: <ptfyd59o.fsf@python.net>

"Tim Peters" <tim.one@comcast.net> writes:

> Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak
> references.  Now Zope3 dies with a segfault when it's shut down, which makes
> its adoption of Python 2.3.2 a bit less attractive <wink>.
>
> The problem isn't really understood.  I hope that once it is, there will be
> a simple way to avoid it under 2.3.2.  Jim filed a bug report with a fix to
> the symptom here:
>
>     http://www.python.org/sf/839548

Is the problem I currently have the same, I also use weakrefs (although
Jim's patch doesn't seem to help)?

It is triggered when I have set the gc threshold to small values in a
2.3.2 debug build under Windows.  When some containers in my program are
destroyed Python crashes with an access violation in
_Py_ForgetReference() because op->_ob_next and
_op->_ob_prev are both NULL:

  void
  _Py_ForgetReference(register PyObject *op)
  {
  #ifdef SLOW_UNREF_CHECK
          register PyObject *p;
  #endif
        if (op->ob_refcnt < 0)
                Py_FatalError("UNREF negative refcnt");
        if (op == &refchain ||
            op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op)
                Py_FatalError("UNREF invalid object");

First I suspected buggy gc support in an extension module I have but the
crash also occurrs when I remove it.

Thomas

PS: Here is the stack trace as displayed an MSVC6:

_Py_ForgetReference(_object * 0x01101bd0) line 2001 + 15 bytes
_Py_Dealloc(_object * 0x01101bd0) line 2021 + 9 bytes
delete_garbage(_gc_head * 0x0012d640, _gc_head * 0x1e1783e0) line 516 + 81 bytes
collect(int 0) line 625 + 13 bytes
collect_generations() line 673 + 9 bytes
_PyObject_GC_Malloc(unsigned int 24) line 1061
_PyObject_GC_New(_typeobject * 0x1e186c00 _PyListIter_Type) line 1070 + 12 bytes
list_iter(_object * 0x01101c08) line 2414 + 10 bytes
PyObject_GetIter(_object * 0x01101c08) line 2161 + 7 bytes
eval_frame(_frame * 0x008aa278) line 2077 + 9 bytes
PyEval_EvalCodeEx(PyCodeObject * 0x00b8be40, _object * 0x00b7af50, _object * 0x00000000, _object * * 0x008e3808, int 0, _object * * 0x008e3808, int 1, _object * * 0x00000000, int 0, _object * 0x00000000) line 2663 + 9 bytes
fast_function(_object * 0x00b966c0, _object * * * 0x0012da24, int 2, int 0, int 1) line 3532 + 68 bytes
call_function(_object * * * 0x0012da24, int 256) line 3458 + 25 bytes
eval_frame(_frame * 0x008e36a8) line 2116 + 13 bytes
PyEval_EvalCodeEx(PyCodeObject * 0x00b8bb28, _object * 0x00b7af50, _object * 0x00000000, _object * * 0x01117d6c, int 1, _object * * 0x00000000, int 0, _object * * 0x0111712c, int 1, _object * 0x00000000) line 2663 + 9 bytes
function_call(_object * 0x011028d0, _object * 0x01117d58, _object * 0x00000000) line 509 + 64 bytes
PyObject_Call(_object * 0x011028d0, _object * 0x01117d58, _object * 0x00000000) line 1755 + 15 bytes
PyObject_CallFunction(_object * 0x011028d0, char * 0x1e1c63b8) line 1797 + 15 bytes
handle_callback(_PyWeakReference * 0x01114f68, _object * 0x011028d0) line 684 + 18 bytes
PyObject_ClearWeakRefs(_object * 0x01101bd0) line 750 + 13 bytes
subtype_dealloc(_object * 0x01101bd0) line 656 + 9 bytes
_Py_Dealloc(_object * 0x01101bd0) line 2022 + 7 bytes
list_dealloc(PyListObject * 0x00a0f930) line 214 + 153 bytes
_Py_Dealloc(_object * 0x00a0f930) line 2022 + 7 bytes
dict_dealloc(_dictobject * 0x01100380) line 708 + 108 bytes
_Py_Dealloc(_object * 0x01100380) line 2022 + 7 bytes
subtype_dealloc(_object * 0x010f4f18) line 680 + 81 bytes
_Py_Dealloc(_object * 0x010f4f18) line 2022 + 7 bytes
PyDict_DelItem(_object * 0x01100428, _object * 0x00a73368) line 583 + 81 bytes
PyObject_GenericSetAttr(_object * 0x010f4ee0, _object * 0x00a73368, _object * 0x00000000) line 1529 + 13 bytes
PyObject_SetAttr(_object * 0x010f4ee0, _object * 0x00a73368, _object * 0x00000000) line 1289 + 18 bytes
eval_frame(_frame * 0x008f8b38) line 1760 + 15 bytes
PyEval_EvalCodeEx(PyCodeObject * 0x00b48d90, _object * 0x00b42188, _object * 0x00000000, _object * * 0x00893c70, int 5, _object * * 0x00893c84, int 0, _object * * 0x00000000, int 0, _object * 0x00000000) line 2663 + 9 bytes
fast_function(_object * 0x00b57980, _object * * * 0x0012e070, int 5, int 5, int 0) line 3532 + 68 bytes
call_function(_object * * * 0x0012e070, int 4) line 3458 + 25 bytes
eval_frame(_frame * 0x00893b08) line 2116 + 13 bytes
fast_function(_object * 0x00b79fb0, _object * * * 0x0012e218, int 5, int 5, int 0) line 3518 + 9 bytes
call_function(_object * * * 0x0012e218, int 4) line 3458 + 25 bytes
eval_frame(_frame * 0x008958e8) line 2116 + 13 bytes
PyEval_EvalCodeEx(PyCodeObject * 0x00b48188, _object * 0x00b42188, _object * 0x00000000, _object * * 0x011194cc, int 4, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0, _object * 0x00000000) line 2663 + 9 bytes
function_call(_object * 0x00b4cea8, _object * 0x011194b8, _object * 0x00000000) line 509 + 64 bytes
PyObject_Call(_object * 0x00b4cea8, _object * 0x011194b8, _object * 0x00000000) line 1755 + 15 bytes
PyEval_CallObjectWithKeywords(_object * 0x00b4cea8, _object * 0x011194b8, _object * 0x00000000) line 3346 + 17 bytes
PyObject_CallObject(_object * 0x00b4cea8, _object * 0x011194b8) line 1746 + 15 bytes
_CallPythonObject(void * 0x0012e3e4, char * 0x10010e00, _object * 0x00b4cea8, _object * 0x00a9fbc0, void * * 0x0012e41c) line 178 + 14 bytes
i_CallPythonObject(_object * 0x00b4cea8, _object * 0x00a9fbc0, void * * 0x0012e40c) line 213 + 26 bytes


From tim at zope.com  Tue Nov 11 15:41:11 2003
From: tim at zope.com (Tim Peters)
Date: Tue Nov 11 15:42:16 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOMHAAB.tim@zope.com>

[Phillip J. Eby]
> ...
> Actually, the funny thing here is that it's unlikely that the cycle a
> type is in involves its base classes.

Well, all new-style classes are in cycles with bases:

>>> class C(object): pass
...
>>> object.__subclasses__()[-1]  # so C is reachable from object
<class '__main__.C'>
>>> C.__mro__                    # and object is reachable from C
(<class '__main__.C'>, <type 'object'>)
>>>

For that matter, since the first element of the MRO is the class itself, a
new-style class is in a self-cycle.  That also requires clearing the MRO to
break.

IIRC, one of the reasons Guido wanted to call gc during finalization was to
give these new-style class cycles a chance to destroy themselves cleanly.

> ...
> What's baffling me is what code is accessing the class after tp_clear
> is called.  It can't be a __del__ method, or the cycle collector
> wouldn't be calling tp_clear, right?  Or does it run __del__ methods
> during shutdown?

Jim explained -- as best we can without a finite test case to nail it.

There does seem to be an assumption that a class object won't get collected
if any instance of the class is still around.  "Because" the class object
would have a reference to it from the class instance, so that a live class
instance keeps the class alive.  But, if the class object and all remaining
instances are all in one cycle, and that cycle is unreachable from outside,
and the class doesn't define a __del__ method, then I *expect* gc would try
to clean up the dead cycle.  In that case, gc starts calling tp_clear slots
in a seemingly arbitrary order.  If the destruction of a class instance then
happened to trigger a weakref callback which in turn tried to access an
attribute of the class, and the class had already been through its tp_clear,
then a NULL-pointer dereference (due to the cleared tp_mro slot) would be
unavoidable.

But if that's what's happening, then tricks like the one on the table may
not be enough to stop segfaults:  replacing tp_mro with an empty tuple only
"works" so long as the class object hasn't also been thru its tp_dealloc
routine.  Once it goes thru tp_dealloc, the memory is recyclable heap trash,
and tp_mro may or may not retain the bits that "look like" a pointer to an
empty tuple by the time some weakref callback triggers an access to them.
In a release build it's likely that the "pointer to an empty tuple" will
survive across deallocation for at least a little while, because tp_mro
isn't near an end of the object (so is unlikely to get overridden by
malloc's or pymalloc's internal bookkeeping pointers).  It's a crapshoot,
though.

A complication in all this is that Python's cyclic gc never calls tp_dealloc
or tp_free directly!  The only cleanup slot it calls directly is tp_clear.
Deallocations still occur only as side effects of refcounts falling to 0, as
tp_clear actions break cycles (and execute Py_DECREFs along the way).

This protects against a class's tp_dealloc (but not tp_clear) getting called
while instances still exist, even if they're all in one cycle.  But "still
exist" gets fuzzy then.


Here's a cute one:

"""
class C(object):
    pass

def pp():
    import winsound
    winsound.Beep(2000, 500)

import weakref
wr = weakref.ref(C, lambda ignore, pp=pp: pp())
del C  # this isn't enough to free C:  C is still in at least two cycles
"""

C:\Python23>python temp5.py
Fatal Python error: Interpreter not initialized (version mismatch?)

abnormal program termination

C:\Python23>

That one is due to the weakref callback getting called after Py_Finalize
does

	initialized = 0;

so that the "import winsound" fails (I gave up trying to print things in
callbacks <wink/sigh>).


From theller at python.net  Tue Nov 11 16:05:31 2003
From: theller at python.net (Thomas Heller)
Date: Tue Nov 11 16:05:43 2003
Subject: [Python-Dev] More fun with Python shutdown
In-Reply-To: <ptfyd59o.fsf@python.net> (Thomas Heller's message of "Tue, 11
	Nov 2003 21:22:43 +0100")
References: <LNBBLJKPBEHFEDALKOLCOEKKHAAB.tim.one@comcast.net>
	<ptfyd59o.fsf@python.net>
Message-ID: <he1ad3ac.fsf@python.net>

Thomas Heller <theller@python.net> writes:

> "Tim Peters" <tim.one@comcast.net> writes:
>
>> Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak
>> references.  Now Zope3 dies with a segfault when it's shut down, which makes
>> its adoption of Python 2.3.2 a bit less attractive <wink>.
>>
>> The problem isn't really understood.  I hope that once it is, there will be
>> a simple way to avoid it under 2.3.2.  Jim filed a bug report with a fix to
>> the symptom here:
>>
>>     http://www.python.org/sf/839548
>
> Is the problem I currently have the same, I also use weakrefs (although
> Jim's patch doesn't seem to help)?
>
> It is triggered when I have set the gc threshold to small values in a
> 2.3.2 debug build under Windows.  When some containers in my program are
> destroyed Python crashes with an access violation in
> _Py_ForgetReference() because op->_ob_next and
> _op->_ob_prev are both NULL:
>
>   void
>   _Py_ForgetReference(register PyObject *op)
>   {
>   #ifdef SLOW_UNREF_CHECK
>           register PyObject *p;
>   #endif
>         if (op->ob_refcnt < 0)
>                 Py_FatalError("UNREF negative refcnt");
>         if (op == &refchain ||
>             op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op)
>                 Py_FatalError("UNREF invalid object");

Here is the smallest program I can currently come up with that triggers
this bug.  Most of the code is extracted from Patrick O'Brian's
dispatcher module on activestate's cookbook site, it creates weak
references to bound methods by dissecting them into im_self and im_func.

This program only prints "A" before crashing, so it does occur *before*
interpreter shutdown.

Thomas

-----
import weakref
import gc

gc.set_threshold(1)

connections = {}
_boundMethods = weakref.WeakKeyDictionary()

def safeRef(object):
    selfkey = object.im_self
    funckey = object.im_func
    if not _boundMethods.has_key(selfkey):
        _boundMethods[selfkey] = weakref.WeakKeyDictionary()
    if not _boundMethods[selfkey].has_key(funckey):
        _boundMethods[selfkey][funckey] = \
        BoundMethodWeakref(boundMethod=object)
    return _boundMethods[selfkey][funckey]

class BoundMethodWeakref:
    def __init__(self, boundMethod):
        def remove(object, self=self):
            _removeReceiver(receiver=self)
        self.weakSelf = weakref.ref(boundMethod.im_self, remove)
        self.weakFunc = weakref.ref(boundMethod.im_func, remove)

def _removeReceiver(receiver):
    for senderkey in connections.keys():
        for signal in connections[senderkey].keys():
            receivers = connections[senderkey][signal]
            try: receivers.remove(receiver)
            except: pass
            _cleanupConnections(senderkey, signal)

################

class X(object):
    def test(self):
        pass

def test():
    print "A"
    safeRef(X().test)
    print "B"

if __name__ == "__main__":
    test()
-----


From pje at telecommunity.com  Tue Nov 11 18:33:34 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Nov 11 18:36:08 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOMHAAB.tim@zope.com>
References: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com>
Message-ID: <5.1.1.6.0.20031111182245.028bed40@telecommunity.com>

At 03:41 PM 11/11/03 -0500, Tim Peters wrote:
>[Phillip J. Eby]
> > ...
> > Actually, the funny thing here is that it's unlikely that the cycle a
> > type is in involves its base classes.
>
>Well, all new-style classes are in cycles with bases:
>
> >>> class C(object): pass
>..
> >>> object.__subclasses__()[-1]  # so C is reachable from object
><class '__main__.C'>

I thought this was done with weak references.


> >>> C.__mro__                    # and object is reachable from C
>(<class '__main__.C'>, <type 'object'>)
> >>>
>
>For that matter, since the first element of the MRO is the class itself, a

Oops.  I forgot about that.


>A complication in all this is that Python's cyclic gc never calls tp_dealloc
>or tp_free directly!  The only cleanup slot it calls directly is tp_clear.
>Deallocations still occur only as side effects of refcounts falling to 0, as
>tp_clear actions break cycles (and execute Py_DECREFs along the way).
>
>This protects against a class's tp_dealloc (but not tp_clear) getting called
>while instances still exist, even if they're all in one cycle.  But "still
>exist" gets fuzzy then.

Hm.  So what if tp_clear didn't mess with the MRO, except to decref its 
self-reference in the MRO?  tp_dealloc would have to decref the MRO tuple 
then, and deal with the off-by-one refcount for the type that would result 
from the tuple's deallocation.  Could that work?


From tim.one at comcast.net  Tue Nov 11 19:01:08 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Nov 11 19:01:16 2003
Subject: [Python-Dev] More fun with Python shutdown
In-Reply-To: <ptfyd59o.fsf@python.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEADHBAB.tim.one@comcast.net>

>     http://www.python.org/sf/839548

[Thomas Heller]
> Is the problem I currently have the same,

Probably not.

> I also use weakrefs (although Jim's patch doesn't seem to help)?

I guess your problem and Jim's both have in common that you and Zope3 use
assignment statements too <wink>.

> It is triggered when I have set the gc threshold to small values in a
> 2.3.2 debug build under Windows.  When some containers in my program
> are destroyed Python crashes with an access violation in
> _Py_ForgetReference() because op->_ob_next and
> _op->_ob_prev are both NULL:

That's a list of "all objects".  Deallocating an object removes it from that
list.  Trying to deallocate it a second time tries to remove it from the
list a second time, which barfs in just this way.

> PS: Here is the stack trace as displayed an MSVC6:
>
> _Py_ForgetReference(_object * 0x01101bd0) line 2001 + 15 bytes
> _Py_Dealloc(_object * 0x01101bd0) line 2021 + 9 bytes

...

> _Py_Dealloc(_object * 0x01101bd0) line 2022 + 7 bytes

Bingo:  _Py_Dealloc with the same object pointer appears twice in the
stack.

That's almost certainly a bug in Python, but is almost certainly unrelated
to the problem Jim is having.

I was able to make your test case substantially smaller.  The key is that
the "remove" callback trigger gc.  Apart from that, it doesn't matter at all
what "remove" does.  I don't know what the bug is, though, and since the
last of these consumed more than a day to track down and fix, I don't
anticipate having time to do that again:

"""
import weakref
import gc

_boundMethods = weakref.WeakKeyDictionary()

def safeRef(object):
    selfkey = object.im_self
    funckey = object.im_func
    _boundMethods[selfkey] = weakref.WeakKeyDictionary()
    _boundMethods[selfkey][funckey] = BoundMethodWeakref(object)

class BoundMethodWeakref:
    def __init__(self, boundMethod):
        def remove(object):
            gc.collect()
        self.weakSelf = weakref.ref(boundMethod.im_self, remove)

class X(object):
    def test(self):
        pass

def test():
    print "A"
    safeRef(X().test)
    print "B"

if __name__ == "__main__":
    test()
"""

As far as I can get without stopping:

It's dying when the anonymous bound method (X().test) is getting cleaned up.
That decrefs the anonymous X(), marking the end of its life too, which
triggers a weakref callback, which calls gc.collect() (in your original
program, a .keys() method created a list, which was enough to trigger gc
because you set the gc threshold to 1).  The anonymous X() then shows up in
gc's list of garbage, and the Py_DECREF in this part of gc:

			if ((clear = op->ob_type->tp_clear) != NULL) {
				Py_INCREF(op);
				clear(op);
				Py_DECREF(op);
			}

then knocks the refcount on the anonymous X() back to 0 a second time,
triggering the fatal attempt to deallocate an object that's already in the
process of being deallocated.

This *may* be a deep problem.  gc doesn't expect that the refcount on
anything it knows about is already 0 at the time gc gets started.  The way
Python works <wink>, anything whose refcount falls to 0 is recycled without
cyclic gc's help.  Nevertheless, the anonymous X() container *is* in gc's
lists when gc starts here, with a refcount of 0, and gc correctly concludes
that X() isn't reachable from "outside".  That's why it tries to delete X()
itself.

Anyway, the only thing weakrefs have to do with this is that they managed to
trigger gc between the time a gc-tracked container became dead and the time
the container untracked itself from gc.  I'll note that the anonymous bound
method object *did* untrack itself from gc before the fatal part began.

Hmm.  subtype_dealloc() *also* untracked the anonymous X() before the fatal
part began, but then it *re*tracked it:

	/* UnTrack and re-Track around the trashcan macro, alas */
	/* See explanation at end of function for full disclosure */
	PyObject_GC_UnTrack(self);
	++_PyTrash_delete_nesting;
	Py_TRASHCAN_SAFE_BEGIN(self);
	--_PyTrash_delete_nesting;
	_PyObject_GC_TRACK(self); /* We'll untrack for real later */

It's just a few lines later that the suicidal weakref callback gets
triggered.

The good news is that Guido must have spent days in all trying to
bulletproof subtype_dealloc(), so it's not like a bug in this part of the
code is a big surprise <wink>.  It's possible that temporarily incref'ing
self before the PyObject_ClearWeakRefs() call would be a correct fix (that
would prevent gc from believing the object is collectible, and offhand I
don't see anything other than PyObject_ClearWeakRefs here that could trigger
a round of gc).

If that's a correct analysis, this is a very serious bug:
double-deallocation will normally go undetected in a release build, and will
lead to memory corruption.  It will happen only when a weakref callback
happens to trigger gc, *and* the object being torn down at the time happens
to be in a generation gc collects at the time gc is triggered.  So the
conditions that trigger it are rare and unpredictable, and the effects of
the memory corruption it leads to are equally bad (anything can happen, at
any time later).


From greg at cosc.canterbury.ac.nz  Tue Nov 11 19:13:53 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue Nov 11 19:14:08 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOMHAAB.tim@zope.com>
Message-ID: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz>

Tim Peters <tim@zope.com>:

> If the destruction of a class instance then happened to trigger a
> weakref callback which in turn tried to access an attribute of the
> class, and the class had already been through its tp_clear, then a
> NULL-pointer dereference (due to the cleared tp_mro slot) would be
> unavoidable.

The crux of this seems to be that, now that we have weak references,
__del__ methods are not the only thing that can trigger execution of
arbitrary Python code when an object becomes unreferenced.

Maybe the GC should also refuse to collect cycles in which any member
is referenced by a weak reference with an associated callback?

The alternative is to accept that arbitrary Python code can be called
while the GC is in the midst of breaking a cycle.  In that case, it's
unacceptable for any object's tp_clear to set a Python pointer to
NULL, or do anything else that would render the object no longer a
valid Python object.

That would be enough to stop segfaults, but it still wouldn't entirely
solve the problem at hand, because the fact is there's no way to break
the self-cycle in a class's MRO without rendering it unusable as a
class object for at least some purposes.

Which makes me think that the only safe thing to do is treat a
weak-ref-with-callback as tantamount to a __del__ method for GC
purposes.

> But if that's what's happening, then tricks like the one on the table
> may not be enough to stop segfaults: replacing tp_mro with an empty
> tuple only "works" so long as the class object hasn't also been thru
> its tp_dealloc routine.

But that can't happen until the object's refcount has dropped to zero,
in which case it can't be touched any longer by Python code. I don't
think there's any worry with this.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From tim at zope.com  Tue Nov 11 23:17:59 2003
From: tim at zope.com (Tim Peters)
Date: Tue Nov 11 23:18:51 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <5.1.1.6.0.20031111182245.028bed40@telecommunity.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEBEHBAB.tim@zope.com>

[Tim]
>> >>> class C(object): pass
>> ..
>> >>> object.__subclasses__()[-1]  # so C is reachable from object
>> >>> <class '__main__.C'>

[Phillip J. Eby]
> I thought this was done with weak references.

Ouch, yes.  My apologies -- I keep forgetting that one.

>> For that matter, since the first element of the MRO is the class
>> itself [self-cycle]

> Oops.  I forgot about that.

OK, I'll settle for a tie in the forgetfulness contest <wink>.

> Hm.  So what if tp_clear didn't mess with the MRO, except to decref
> its self-reference in the MRO?  tp_dealloc would have to decref the
> MRO tuple then, and deal with the off-by-one refcount for the type
> that would result from the tuple's deallocation.  Could that work?

Until we have a finite test case that reproduces Jim's problem, I don't
know.  It's possible.  My intuition remains that hacking the tp_mro slot is
patching a symptom of a deeper problem that's going to keep coming back in
other guises.

BTW, lying about true refcounts is fraught with subtle dangers.  If you were
the one who had to fiddle the ZODB3 cache to work with Python's cyclic gc,
you'd have a better gut appreciation for that <wink>.


From tim.one at comcast.net  Wed Nov 12 00:22:26 2003
From: tim.one at comcast.net (Tim Peters)
Date: Wed Nov 12 00:22:37 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEBJHBAB.tim.one@comcast.net>

[Greg Ewing]
> The crux of this seems to be that, now that we have weak references,
> __del__ methods are not the only thing that can trigger execution of
> arbitrary Python code when an object becomes unreferenced.

- "this" needs clarification.  Thomas Heller's bug didn't involve
  cycles, but I think that bug has no real intersection with Jim's
  woes.  Some of the shutdown glitches I've displayed here, as well
  as the ones people have griped about on c.l.py, also weren't related
  to weakref callbacks.  There's more than one (and more than two ...)
  distinct glitches here.

- It is indeed the callbacks-- not weakrefs per se --that are the
  cause of *most* of these things.

- weakref callbacks are easier to live with than __del__ methods in
  one (and maybe only one) respect:  when the death of X triggers
  a weakref callback C, C isn't passed X, but X.__del__ is.  So a
  weakref callback can't resurrect X, but X.__del__ can.  I'm not
  sure how much comfort to take from that, since a weakref callback
  could presumably resurrect other trash in its dead object's cycle.

> Maybe the GC should also refuse to collect cycles in which any member
> is referenced by a weak reference with an associated callback?

I've been meaning to think about that, but haven't been able to make more
time for it.  It should be possible to construct motivating examples.

> The alternative is to accept that arbitrary Python code can be called
> while the GC is in the midst of breaking a cycle.

Bingo -- that's my fear.  It's hard to say why in advance, but every time
we've found a spot where arbitrary Python code *can* run during gc, we've
eventually been screwed royally on that spot.  Hell, last time we pissed
away most of a week because PyObject_HasAttr (then used to ask whether
__del__ exists; no longer used) ended up making massive changes to a Zope
database as a side effect of indirectly calling the object's class's
__getattr__ hook, mutating the Python object graph massively in the process
as a side effect in turn of all the crap ZODB was doing to materialize
ghosts.  gc has to have a patch of unshifting ground to stand on.

> In that case, it's unacceptable for any object's tp_clear to set
> a Python pointer to NULL, or do anything else that would render the
> object no longer a valid Python object.

I expect it's worse than just that (since it always has been worse than just
that in the past, although nobody has been able to predict exactly how for
every case in advance).

> That would be enough to stop segfaults, but it still wouldn't entirely
> solve the problem at hand, because the fact is there's no way to break
> the self-cycle in a class's MRO without rendering it unusable as a
> class object for at least some purposes.

Phil Eby suggested a hack for that specific one (decrement the refcount, and
that's all -- the MRO holds an "illegitimate" self-reference then; wave
hands, pray, and maybe it doesn't break something else).

> Which makes me think that the only safe thing to do is treat a
> weak-ref-with-callback as tantamount to a __del__ method for GC
> purposes.

Quite possibly so.

>> But if that's what's happening, then tricks like the one on the table
>> may not be enough to stop segfaults: replacing tp_mro with an empty
>> tuple only "works" so long as the class object hasn't also been thru
>> its tp_dealloc routine.

> But that can't happen until the object's refcount has dropped to zero,
> in which case it can't be touched any longer by Python code.

Probably so.  It depends not so much on principle as on the parts of the
code where we cheat (e.g., if it were always true that refcount-dropped-to-0
implies can't-be-touched-again-by-Python-code, then what is it that gets
passed to x.__del__()?  x does -- but we cheat).


From greg at cosc.canterbury.ac.nz  Wed Nov 12 00:54:35 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed Nov 12 00:54:51 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEBJHBAB.tim.one@comcast.net>
Message-ID: <200311120554.hAC5sZn26866@oma.cosc.canterbury.ac.nz>

Tim:

> - weakref callbacks are easier to live with than __del__ methods in
>   one (and maybe only one) respect:  when the death of X triggers
>   a weakref callback C, C isn't passed X, but X.__del__ is.  So a
>   weakref callback can't resurrect X, but X.__del__ can.

The object causing trouble doesn't need to be the one that died,
e.g. doing a tp_clear on X causes Y to die which triggers a weakref
callback which references X by some route. Resurrection of X isn't an
issue, because it's not dead yet -- it is, however, in the process of
being indiscriminately torn apart by the GC, messing up who-knows-what
invariant that the callback might be relying on.

So I can't see that the lack of possibility of resurrection helps
much at all.

> e.g., if it were always true that refcount-dropped-to-0 implies
> can't-be-touched-again-by-Python-code, then what is it that gets
> passed to x.__del__()?  x does -- but we cheat

But (I hope, at least!) it's guaranteed that the x.__del__()
call is completed before any of the C-level deallocation code
for x is begun...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From aleaxit at yahoo.com  Wed Nov 12 03:14:08 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov 12 03:14:14 2003
Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ?
Message-ID: <200311120914.08946.aleaxit@yahoo.com>

Somebody just wrote to help@python.org asking for guidance in resolving some 
conflicts in comments in 2.3.2 files regarding sleepycat versions we support.

In Modules/Setup we say:
"""
The earliest supported version of that library is 3.0, the latest
supported version is 4.0 (4.1 is specifically not supported,
"""

In README we say:
"""
Only versions 3.1 through 4.1 of Sleepycat's libraries provide the 
necessary API
"""

In setup.py we say:
"""
The earliest supported version of that library is 3.1, the latest 
supported version is 4.2 ... 3.1 is only partially supported
"""

I believe that setup.py is accurate, README slightly out of date,
Modules/Setup way out of date -- but I thought that double
checking couldn't possibly hurt.  So, can I confirm this to the
help@python.org querant, and fix the comments in README (should
it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial support"
for 3.1?) and Modules/Setup (presumably with a pointer to setup.py)?


Alex


From aleaxit at yahoo.com  Wed Nov 12 03:55:49 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov 12 03:55:56 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <3FB144D4.8060307@v.loewis.de>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311112009.hABK92M18120@12-236-54-216.client.attbi.com>
	<3FB144D4.8060307@v.loewis.de>
Message-ID: <200311120955.49289.aleaxit@yahoo.com>

On Tuesday 11 November 2003 09:21 pm, Martin v. L?wis wrote:
> Guido van Rossum wrote:
> > How big would ICU binaries for Windows be?  I don't mind bloating the
> > Windows installer by a few MB.  As long as it doesn't have to land in
> > CVS...
>
> See
>
> ftp://www-126.ibm.com/pub/icu/2.6.1/icu-2.6.1.zip
>
> I haven't actually downloaded it because of size (9MB); the zip file
> may contain header files and the like which we shouldn't ship.

I have downloaded it, and it's a sources zipfile (needs to be unpacked
with the -a to unzip, on Linux).  I'm not quite sure of how to estimate
the size of the Windows binaries since I don't have a decent Windows
system to build it on at the moment.  For a Linux-on-386 build, I see:

[alex@lancelot source]$ size /usr/local/lib/libicu*.so.26.1
   text    data     bss     dec     hex filename
8449053    3948       4 8453005  80fb8d /usr/local/lib/libicudata.so.26.1
 875940   14528     908  891376   d99f0 /usr/local/lib/libicui18n.so.26.1
  51426    4296       8   55730    d9b2 /usr/local/lib/libicuio.so.26.1
 145377    4160       4  149541   24825 /usr/local/lib/libicule.so.26.1
  29860    1244       4   31108    7984 /usr/local/lib/libiculx.so.26.1
  26339    1004       4   27347    6ad3 /usr/local/lib/libicutoolutil.so.26.1
 664190   21100     356  685646   a764e /usr/local/lib/libicuuc.so.26.1

and zipping just these .so.26.1 files to gain an idea of their overall
compressibility gives me:

alex@lancelot source]$ zip fup /usr/local/lib/libicu*.so.26.1
  adding: usr/local/lib/libicudata.so.26.1 (deflated 54%)
  adding: usr/local/lib/libicui18n.so.26.1 (deflated 65%)
  adding: usr/local/lib/libicuio.so.26.1 (deflated 64%)
  adding: usr/local/lib/libicule.so.26.1 (deflated 70%)
  adding: usr/local/lib/libiculx.so.26.1 (deflated 65%)
  adding: usr/local/lib/libicutoolutil.so.26.1 (deflated 55%)
  adding: usr/local/lib/libicuuc.so.26.1 (deflated 60%)
[alex@lancelot source]$ ll fup.zip
-rw-rw-r--    1 alex     alex      4790641 Nov 12 09:53 fup.zip

I'm sure I've forgotten something, but I hope the sizes are roughly
indicative and about 5MB compressed, 10MB on disk, are more or
less what we could be adding to the Python windows installer if
it came with ICU.

Perhaps somebody with a decent Windows platform can measure
this more accurately!-)


Alex


From Boris.Boutillier at arteris.net  Wed Nov 12 04:38:35 2003
From: Boris.Boutillier at arteris.net (Boris Boutillier)
Date: Wed Nov 12 04:38:44 2003
Subject: [Python-Dev] New flag to differentiate Builtins and extensions
	classes ?
Message-ID: <3FB1FF9B.7040508@arteris.net>

I look into the archives and didn't see any debate on the question, hope 
I didn't miss something.

My point concerns limitations on extensions module due to checks aiming 
the builtins.
The main point is settable extension classes.
In Python code there is some checks against TPFLAGS_HEAPTYPE, extension 
modules should'nt have this flag, so the normal type->tp_setattro doesnt 
allow the user to
set new attributes on your extension classes. There is a way around, 
write a special MetaClass which redefine setattr.

In the extension module I'm writing (I'm porting some Python code to 
Python-C for speed issues) the user can set attributes and slots on my 
classes.
What I need is the complete type->tp_setattro behaviour, without the 
check. I didn't see a way to have this behaviour using only Python API 
(is rereadying the type a work around ?), so I copy paste all the code 
to make update_slots work (ouch 2500 lines).
This is now almost working, every kind of attribute can be set but the 
__setattr__ one, the hackcheck prevents the user from calling another 
__setattr__ from its new setattr:
example of my extension class hierachy:
Class A(object)
Class B(A)

In the extension, there is a tp->setattro on B, if the user want to 
redefine it, he can't call the A __setattr__:
def myBSetattr(self,k,v):
   super(B,self).__setattr__(k,v)
   ## Do here my special stuff
This won't work, the hachcheck will see some kind of hack here, 'you 
cant' call the A.__setattr__ function from a B object' .

First question, Is there a known way around ?

Possible Improvments :

In the python code there is in function function checks to see if you 
are not modying builtins classes, unfortunately this code is also 
concerning extension modules.
I think the Heaptype flag is abusively used in differents cases mostly, 
in type_setattro, object_set_bases, object_set_classes, the checks have 
nothing to do with the HeapType true definition as stated in the 
comments in Include/Object.h , it is used, I think, only because this is 
the only one that makes a difference between builtins and user classes. 
Unfortunately with  this flag extension classes  fall into the 
'builtin'  part.

A way to solve the problem without backward compatibility problems, 
would be to have a new TPFLAGS_SETABLE flag, defaulting to 0 for 
builtins/extension classes and 1 for User,Python classes. This flag 
would be check in place of the heaptype one when revelant.

I'm ready to write the code for this if there is some positive votes, 
won't bother if everybody is against it.

Boris


From bh at intevation.de  Wed Nov 12 06:51:08 2003
From: bh at intevation.de (Bernhard Herzog)
Date: Wed Nov 12 06:51:16 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz> (Greg
	Ewing's message of "Wed, 12 Nov 2003 13:13:53 +1300 (NZDT)")
References: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz>
Message-ID: <6qn0b1yfdf.fsf@salmakis.intevation.de>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> Maybe the GC should also refuse to collect cycles in which any member
> is referenced by a weak reference with an associated callback?

Wouldn't it be possible to call the callbacks of all weakrefs that point
to a cycle about to be destroyed before that destruction begins?

   Bernhard

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
Thuban                                  http://thuban.intevation.org/

From mwh at python.net  Wed Nov 12 07:41:44 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov 12 07:41:49 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <eppstein-CF3FD2.10090511112003@sea.gmane.org> (David
	Eppstein's message of "Tue, 11 Nov 2003 10:09:05 -0800")
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<eppstein-CF3FD2.10090511112003@sea.gmane.org>
Message-ID: <2mr80du5br.fsf@starship.python.net>

David Eppstein <eppstein@ics.uci.edu> writes:

> Ok, it sounds like I am stuck with PyObjC's 
> NSString.localizedCaseInsensitiveCompare_, since Python's built-in 
> cmp(unicode,unicode) sucks and locale doesn't provide an alternative.

"sucks" is too strong.  Maybe there should be better collation support
but I don't think we should change the default comparison to do it.

Cheers,
mwh

-- 
  The ability to quote is a serviceable substitute for wit.
                                                -- W. Somerset Maugham

From mwh at python.net  Wed Nov 12 07:43:40 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov 12 07:43:43 2003
Subject: [Python-Dev] New flag to differentiate Builtins and extensions
	classes ?
In-Reply-To: <3FB1FF9B.7040508@arteris.net> (Boris Boutillier's message of
	"Wed, 12 Nov 2003 10:38:35 +0100")
References: <3FB1FF9B.7040508@arteris.net>
Message-ID: <2mn0b1u58j.fsf@starship.python.net>

Boris Boutillier <Boris.Boutillier@arteris.net> writes:

> I look into the archives and didn't see any debate on the question,
> hope I didn't miss something.

Apart from your four(?) posts on the subject and various replies from
me and Guido?

Cheers,
mwh

-- 
   This proposal, if accepted, will probably mean a heck of a lot of
   work for somebody.  But since I don't want it accepted, I don't
   care.                                   -- Laura Creighton, PEP 666

From barry at python.org  Wed Nov 12 07:45:00 2003
From: barry at python.org (Barry Warsaw)
Date: Wed Nov 12 07:45:11 2003
Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ?
In-Reply-To: <200311120914.08946.aleaxit@yahoo.com>
References: <200311120914.08946.aleaxit@yahoo.com>
Message-ID: <1068641100.31989.85.camel@anthem>

On Wed, 2003-11-12 at 03:14, Alex Martelli wrote:

> I believe that setup.py is accurate, README slightly out of date,
> Modules/Setup way out of date -- but I thought that double
> checking couldn't possibly hurt.  So, can I confirm this to the
> help@python.org querant, and fix the comments in README (should
> it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial support"
> for 3.1?) and Modules/Setup (presumably with a pointer to setup.py)?

Greg can give the definitive answer here, but my understanding is that
the bsddb wrapper in Python 2.3 probably requires at least BerkeleyDB
3.3.11, supports up to 4.1.25, with the latter recommended (if it were
up to me, at least :).  The wrapper in Python 2.3.x probably does not
support BerkeleyDB 4.2.x.

-Barry


From aleaxit at yahoo.com  Wed Nov 12 08:07:48 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov 12 08:08:02 2003
Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ?
In-Reply-To: <1068641100.31989.85.camel@anthem>
References: <200311120914.08946.aleaxit@yahoo.com>
	<1068641100.31989.85.camel@anthem>
Message-ID: <200311121407.48286.aleaxit@yahoo.com>

On Wednesday 12 November 2003 01:45 pm, Barry Warsaw wrote:
> On Wed, 2003-11-12 at 03:14, Alex Martelli wrote:
> > I believe that setup.py is accurate, README slightly out of date,
> > Modules/Setup way out of date -- but I thought that double
> > checking couldn't possibly hurt.  So, can I confirm this to the
> > help@python.org querant, and fix the comments in README (should
> > it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial
> > support" for 3.1?) and Modules/Setup (presumably with a pointer to
> > setup.py)?
>
> Greg can give the definitive answer here, but my understanding is that
> the bsddb wrapper in Python 2.3 probably requires at least BerkeleyDB
> 3.3.11, supports up to 4.1.25, with the latter recommended (if it were
> up to me, at least :).  The wrapper in Python 2.3.x probably does not
> support BerkeleyDB 4.2.x.

Hmmm -- that's bad, because 2.3's setup.py does appear to be looking
for 4.2 with priority, so, if that's installed on the user's machine, we might
be looking for trouble...


Alex


From aleaxit at yahoo.com  Wed Nov 12 08:12:35 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Wed Nov 12 08:12:43 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <2mr80du5br.fsf@starship.python.net>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<eppstein-CF3FD2.10090511112003@sea.gmane.org>
	<2mr80du5br.fsf@starship.python.net>
Message-ID: <200311121412.35765.aleaxit@yahoo.com>

On Wednesday 12 November 2003 01:41 pm, Michael Hudson wrote:
> David Eppstein <eppstein@ics.uci.edu> writes:
> > Ok, it sounds like I am stuck with PyObjC's
> > NSString.localizedCaseInsensitiveCompare_, since Python's built-in
> > cmp(unicode,unicode) sucks and locale doesn't provide an alternative.
>
> "sucks" is too strong.  Maybe there should be better collation support
> but I don't think we should change the default comparison to do it.

That seems sensible to me.  However, if we do get stuck with a
"comparison function", then sorting may not be quite as smooth (the
cf would need to be called for each comparison); it might be better
to be able to get something suitable for passing to key=  -- i.e., the
equivalent of C's strxfrm(), rather than of strcoll(), if one had to choose.


Alex


From guido at python.org  Wed Nov 12 11:08:12 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov 12 11:08:24 2003
Subject: [Python-Dev] 
	Re: [Python-checkins] python/dist/src/Lib sets.py, 1.47, 1.48
In-Reply-To: Your message of "Wed, 12 Nov 2003 07:21:22 PST."
	<E1AJwo6-0007A9-00@sc8-pr-cvs1.sourceforge.net> 
References: <E1AJwo6-0007A9-00@sc8-pr-cvs1.sourceforge.net> 
Message-ID: <200311121608.hACG8Cd20609@12-236-54-216.client.attbi.com>

> Modified Files:
> 	sets.py 
> Log Message:
> Improve backwards compatibility code to handle True/False.
> 
> Index: sets.py
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Lib/sets.py,v
> retrieving revision 1.47
> retrieving revision 1.48
> diff -C2 -d -r1.47 -r1.48
> *** sets.py	8 Sep 2003 19:16:36 -0000	1.47
> --- sets.py	12 Nov 2003 15:21:20 -0000	1.48
> ***************
> *** 74,77 ****
> --- 74,81 ----
>               if not predicate(x):
>                   yield x
> +     try:
> +         True, False
> +     except NameError:
> +         True, False = (0==0, 0!=0)
>   
>   __all__ = ['BaseSet', 'Set', 'ImmutableSet']

What's this doing in the 2.4 CVS?

--Guido van Rossum (home page: http://www.python.org/~guido/)

From eppstein at ics.uci.edu  Wed Nov 12 12:02:14 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Wed Nov 12 12:02:17 2003
Subject: [Python-Dev] Re: other "magic strings" issues
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<eppstein-CF3FD2.10090511112003@sea.gmane.org>
	<2mr80du5br.fsf@starship.python.net>
Message-ID: <eppstein-5D0279.09021412112003@sea.gmane.org>

In article <2mr80du5br.fsf@starship.python.net>,
 Michael Hudson <mwh@python.net> wrote:

> David Eppstein <eppstein@ics.uci.edu> writes:
> 
> > Ok, it sounds like I am stuck with PyObjC's 
> > NSString.localizedCaseInsensitiveCompare_, since Python's built-in 
> > cmp(unicode,unicode) sucks and locale doesn't provide an alternative.
> 
> "sucks" is too strong.  Maybe there should be better collation support
> but I don't think we should change the default comparison to do it.

Let me be more specific.  Since we have such useful hashing-based 
dictionary data structures in Python, we don't often need cmp for 
binary search trees, so the main reason for comparing unicodes (as far 
as I can tell) is to put them in a logical order for displaying to 
humans.  cmp(unicode,unicode) does a very bad job of this, whenever 
there are non-ascii characters involved.  Its existence tricks you into 
thinking Python has a useful unicode comparison function when it 
doesn't.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From theller at python.net  Wed Nov 12 12:13:44 2003
From: theller at python.net (Thomas Heller)
Date: Wed Nov 12 12:14:04 2003
Subject: [Python-Dev] More fun with Python shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEADHBAB.tim.one@comcast.net> (Tim
	Peters's message of "Tue, 11 Nov 2003 19:01:08 -0500")
References: <LNBBLJKPBEHFEDALKOLCMEADHBAB.tim.one@comcast.net>
Message-ID: <vfppbjcn.fsf@python.net>

"Tim Peters" <tim.one@comcast.net> writes:

> That's almost certainly a bug in Python, but is almost certainly unrelated
> to the problem Jim is having.
>
> I was able to make your test case substantially smaller.  The key is that
> the "remove" callback trigger gc.  Apart from that, it doesn't matter at all
> what "remove" does.  I don't know what the bug is, though, and since the
> last of these consumed more than a day to track down and fix, I don't
> anticipate having time to do that again:

Thanks.  I've submitted a bug http://www.python.org/sf/840829 for it.
I have the impression that I'm not able to fix the bug myself, although
I consider it a critical bug since it basically makes weakref callbacks
unusable because gc can occur at any time.

My workaround for now is to disable gc as the fist action in the
callback and enable it again as the last action, but I'm unconvinced
that this does really help in all cases.

Thomas


From trentm at ActiveState.com  Wed Nov 12 14:27:26 2003
From: trentm at ActiveState.com (Trent Mick)
Date: Wed Nov 12 14:31:42 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <200311120955.49289.aleaxit@yahoo.com>;
	from aleaxit@yahoo.com on Wed, Nov 12, 2003 at 09:55:49AM +0100
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311112009.hABK92M18120@12-236-54-216.client.attbi.com>
	<3FB144D4.8060307@v.loewis.de>
	<200311120955.49289.aleaxit@yahoo.com>
Message-ID: <20031112112725.A6879@ActiveState.com>

[Martin]
> > ftp://www-126.ibm.com/pub/icu/2.6.1/icu-2.6.1.zip

[Alex Martelli wrote]
> For a Linux-on-386 build, I see:
> 
> [alex@lancelot source]$ size /usr/local/lib/libicu*.so.26.1
>    text    data     bss     dec     hex filename
> 8449053    3948       4 8453005  80fb8d /usr/local/lib/libicudata.so.26.1
>  875940   14528     908  891376   d99f0 /usr/local/lib/libicui18n.so.26.1
>   51426    4296       8   55730    d9b2 /usr/local/lib/libicuio.so.26.1
>  145377    4160       4  149541   24825 /usr/local/lib/libicule.so.26.1
>   29860    1244       4   31108    7984 /usr/local/lib/libiculx.so.26.1
>   26339    1004       4   27347    6ad3 /usr/local/lib/libicutoolutil.so.26.1
>  664190   21100     356  685646   a764e /usr/local/lib/libicuuc.so.26.1
> 
> ...
>
> Perhaps somebody with a decent Windows platform can measure
> this more accurately!-)

For a Windows build (on Win2K compiled with VC++ 6):

 Directory of D:\trentm\tmp\icu\bin

12/11/2003  11:21a      <DIR>          .
12/11/2003  11:21a      <DIR>          ..
12/11/2003  11:18a              20,480 ctestfw.dll
12/11/2003  11:19a              16,384 decmn.exe
12/11/2003  11:19a              20,480 derb.exe
12/11/2003  11:20a              16,384 genbrk.exe
12/11/2003  11:19a              16,384 genccode.exe
12/11/2003  11:19a              16,384 gencmn.exe
12/11/2003  11:19a              20,480 gencnval.exe
12/11/2003  11:20a              49,152 genidna.exe
12/11/2003  11:19a              20,480 gennames.exe
12/11/2003  11:19a              32,768 gennorm.exe
12/11/2003  11:19a              49,152 genpname.exe
12/11/2003  11:19a              32,768 genprops.exe
12/11/2003  11:19a              69,632 genrb.exe
12/11/2003  11:19a              16,384 gentest.exe
12/11/2003  11:19a              20,480 gentz.exe
12/11/2003  11:19a              24,576 genuca.exe
12/11/2003  11:20a           8,495,104 icudt26l.dll
12/11/2003  11:19a             692,224 icuin26.dll
12/11/2003  11:21a              57,344 icuio26.dll
12/11/2003  11:20a              90,112 icule26.dll
12/11/2003  11:21a              40,960 iculx26.dll
12/11/2003  11:19a              32,768 icutu26.dll
12/11/2003  11:18a             585,728 icuuc26.dll
12/11/2003  11:19a              40,960 makeconv.exe
12/11/2003  11:20a              32,768 pkgdata.exe
12/11/2003  11:21a              45,056 uconv.exe
              26 File(s)     10,555,392 bytes


Note that I am just stoopidly compiling and reporting here. :)

Trent


-- 
Trent Mick
TrentM@ActiveState.com

From martin at v.loewis.de  Wed Nov 12 15:31:42 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Wed Nov 12 15:32:03 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <eppstein-5D0279.09021412112003@sea.gmane.org>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<eppstein-CF3FD2.10090511112003@sea.gmane.org>
	<2mr80du5br.fsf@starship.python.net>
	<eppstein-5D0279.09021412112003@sea.gmane.org>
Message-ID: <m3u1595nwx.fsf@mira.informatik.hu-berlin.de>

David Eppstein <eppstein@ics.uci.edu> writes:

> Let me be more specific.  Since we have such useful hashing-based 
> dictionary data structures in Python, we don't often need cmp for 
> binary search trees, so the main reason for comparing unicodes (as far 
> as I can tell) is to put them in a logical order for displaying to 
> humans.  cmp(unicode,unicode) does a very bad job of this, whenever 
> there are non-ascii characters involved.  Its existence tricks you into 
> thinking Python has a useful unicode comparison function when it 
> doesn't.

It's useful for sorting, but not for collation. Comparing!=Collating.

That said, locale.strcoll does what you want.

Regards,
Martin

From martin at v.loewis.de  Wed Nov 12 15:34:59 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Wed Nov 12 15:35:28 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <200311112009.hABK92M18120@12-236-54-216.client.attbi.com>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<3FB137C1.9000903@v.loewis.de>
	<200311111956.hABJuZh18034@12-236-54-216.client.attbi.com>
	<3FB14122.708@v.loewis.de>
	<200311112009.hABK92M18120@12-236-54-216.client.attbi.com>
Message-ID: <m3ptfx5nrg.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> > More realistically, we could expose wcscoll(3) where available,
> > which would extend the Python locale model to Unicode (assuming
> > the C library uses Unicode in wchar_t).
> 
> I don't know what that is, but if you recommend it, I support it.

I should have remembered this time machine. locale.strcoll already
uses wcscoll if the platform supports it, so locale.strcoll should
be used for locale-aware collation.

locale.strxfrm does not (yet) support Unicode; I'm uncertain whether
it should (as you typically use this when presenting sorted lists to
the user; displaying them will certainly take much longer than sorting
them).

Regards,
Martin

From eppstein at ics.uci.edu  Wed Nov 12 17:13:29 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Wed Nov 12 17:13:32 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <m3u1595nwx.fsf@mira.informatik.hu-berlin.de>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<eppstein-CF3FD2.10090511112003@sea.gmane.org>
	<2mr80du5br.fsf@starship.python.net>
	<eppstein-5D0279.09021412112003@sea.gmane.org>
	<m3u1595nwx.fsf@mira.informatik.hu-berlin.de>
Message-ID: <87474468.1068646409@dhcp31-56.ics.uci.edu>

On 11/12/03 9:31 PM +0100 "Martin v. L?wis" <martin@v.loewis.de> wrote:
> That said, locale.strcoll does what you want.

It does?

>>> locale.strcoll(unicode('Universit?t','utf8'),u'University')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
>>> locale.setlocale(locale.LC_COLLATE,'en_US')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.2/locale.py", line 372, in setlocale
    return _setlocale(category, locale)
locale.Error: locale setting not supported

Even if locale would allow me to set a locale, which locale should I set, 
in order to allow all unicodes (not just e.g. iso-8859-1, but all of them) 
to be collated in some reasonable order?

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From martin at v.loewis.de  Wed Nov 12 18:09:51 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Wed Nov 12 18:10:14 2003
Subject: [Python-Dev] Re: other "magic strings" issues
In-Reply-To: <87474468.1068646409@dhcp31-56.ics.uci.edu>
References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com>
	<200311102251.10904.aleaxit@yahoo.com>
	<eppstein-5F2705.17480610112003@sea.gmane.org>
	<2mhe1buoa5.fsf@starship.python.net>
	<200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>
	<eppstein-CF3FD2.10090511112003@sea.gmane.org>
	<2mr80du5br.fsf@starship.python.net>
	<eppstein-5D0279.09021412112003@sea.gmane.org>
	<m3u1595nwx.fsf@mira.informatik.hu-berlin.de>
	<87474468.1068646409@dhcp31-56.ics.uci.edu>
Message-ID: <m3he195glc.fsf@mira.informatik.hu-berlin.de>

David Eppstein <eppstein@ics.uci.edu> writes:

> It does?

Sure:

Python 2.3 (#26, Aug  1 2003, 09:50:29)
[GCC 3.3 20030226 (prerelease) (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL,"")
'LC_CTYPE=de_DE@euro;LC_NUMERIC=de_DE@euro;LC_TIME=de_DE@euro;LC_COLLATE=C;LC_MONETARY=de_DE@euro;LC_MESSAGES=de_DE@euro;LC_PAPER=de_DE@euro;LC_NAME=de_DE@euro;LC_ADDRESS=de_DE@euro;LC_TELEPHONE=de_DE@euro;LC_MEASUREMENT=de_DE@euro;LC_IDENTIFICATION=de_DE@euro'
>>> locale.strcoll(u"universit\xe4t",u"University")
32
>>> locale.setlocale(locale.LC_ALL,"en_US")
'en_US'
>>> locale.strcoll(u"universit\xe4t",u"University")
-24

> Even if locale would allow me to set a locale, which locale should I
> set, in order to allow all unicodes (not just e.g. iso-8859-1, but all
> of them) to be collated in some reasonable order?

Define "reasonable order". There is no "reasonable order" independent
of the language. In German, it is just not reasonable to have Japanese
characters. Most Germans cannot tell Katakana from Hiragana, so it
just does not matter to them how those collate. Likewise, I guess most
Japanese won't see a difference between an umlaut and a circumflex.

Regards,
Martin

From greg at cosc.canterbury.ac.nz  Wed Nov 12 18:48:20 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed Nov 12 18:49:14 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <6qn0b1yfdf.fsf@salmakis.intevation.de>
Message-ID: <200311122348.hACNmKV03619@oma.cosc.canterbury.ac.nz>

Bernhard Herzog <bh@intevation.de>:

> Wouldn't it be possible to call the callbacks of all weakrefs that point
> to a cycle about to be destroyed before that destruction begins?

I'm not sure that would be a good idea, for the same reasons that it
wouldn't be a good idea to do the same for __del__ methods. Something
might depend on them being called in the right order, or in not being
called too soon.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From tim at zope.com  Wed Nov 12 23:47:37 2003
From: tim at zope.com (Tim Peters)
Date: Wed Nov 12 23:48:40 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <6qn0b1yfdf.fsf@salmakis.intevation.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGLHBAB.tim@zope.com>

[Bernhard Herzog]
> Wouldn't it be possible to call the callbacks of all weakrefs that
> point to a cycle about to be destroyed before that destruction begins?

Yes, but GC couldn't also go on to call tp_clear then -- without deeper
changes, the objects would have to leak.

Suppose objects I and J have (strong) references to each other -- they form
a two-object cycle.  Suppose I also holds a weakref to J, with a callback to
a method of I.

Suppose the cycle becomes unreachable.  GC detects that.  It can also (with
small changes to current code) detect that J has a weakref-associated
callback, and invoke it.

But when the callback returns, GC must stop trying to make progress:  at
that point it knows absolutely nothing anymore about the object graph,
because there's absolutely nothing a callback can't do.  In particular,
because the callback in the example is a method of I, it has full access to
I (via the callback's "self" argument), and because I has a strong reference
to J, it also has full access to J.  The callback can resurrect either or
both the objects, and/or install new weakref callbacks on either or both, or
even break the strong-reference cycle manually so that normal refcounting
completely destroys I before the callback returns (although there's an
obscure technical reason for why the callback can't completely destroy J
before it returns -- I ahd J are different in this one respect).

If GC went on to, for example, execute tp_clear on I or J, tp_clear can
leave behind an accessible (if the callback resurrected it) insane object,
where "insane" means one that a user-- whether in innocence or by hostile
design doesn't matter --can exploit to crash the interpreter.  For example,
Jim has proven that a new-style class object is insane in this way after its
tp_clear is invoked, and it's extremely easy to provoke one into
segfaulting.

Of course that's right out -- we're trying to repair a current segfault, not
supply subtler ways to create segfaults.

We also have to do this within the boundaries of what can be sold for a
bugfix release, so gross changes in semantics are also right out.  In
particular, we've never said that tp_clear has to leave an object in a
usable state, so it would be a hard sell to start to demand that in a bugfix
release.

Still, I want this to work.  There's a saving grace here that __del__
methods don't have:  if a __del__ method resurrects an object, there's
nothing to stop the __del__ method from getting called again (when the
refcount falls to 0 again).  But weakref callbacks are *already* one-shot
things:  a given weakref callback destroys itself as part of the process of
getting invoked.  So once we've invoked a weakref callback for J, that
callback is history.  Sick code *in* the callback could install *another*
weakref callback on J, so we have to be careful, but J's original callbacks
are gone forever, and in almost all code will leave J callback-free.

As above, GC cannot go on to call tp_clear after invoking a callback.
However, after invoking all the callbacks, it *could* start another "mini"
gc cycle, taking the list of cyclic trash as its starting point (as "the
generation" to be collected).  This is the only way it can know what the
post-callback state of the object graph is.  In all sane code, this mini-gc
run will discover that (a) all this stuff is still cyclic trash, and (b)
none of it has weakref-callbacks anymore.  *Then* it's safe to run through
the list calling tp_clear methods.

In sick code (code that resurrects objects via a weakref callback, or
registers new weakref callbacks to dead objects via a weakref callback), the
mini gc run will automatically remove the resurrected objects from current
consideration (they'll move to an older generation as a matter of course).
It may even discover that nothing is trash anymore.  If so, no harm done:
because we haven't called tp_clear on anything, nothing has been damaged.

If there's some trash left with (necessarily) new weakref callbacks, we're
back to where we started.  We *could* proceed the same way then, but I'm
afraid that would give actively hostile code a way to put gc into a
never-ending loop.  Instead I'd simply move those objects into the next
generation, and let gc end then.  Again, because we haven't called tp_clear
on anything, nothing has been damaged in this case either.

A subtlety:  instead of doing the "mini gc pass", why not just move the
leftover objects into an older generation and let gc return right away then?
The problem:  any weakref callback in any cyclic trash would stop a complete
invocation of gc from removing any trash then.  A perfectly ordinary,
non-hostile program, that happened to create lots of weakref callbacks in
cyclic trash could then get into a state where every time gc runs, it finds
one of these things, and despite that the app never does anything sick (like
resurrecting in a callback), gc would never make any progress.  The true
purpose of the "mini gc pass" is to ensure that gc does make progress in
sane code, and no matter how quickly and sustainedly it creates dead cycles
containing weakref callbacks.

Terminology subtlety:  the "mini" in "mini gc pass" refers to that the
generation it starts with is presumably small, not to that this pass has an
especially easy time of it.  It still has to do all the work of deducing
liveness  and deadness from scratch.  There are no shortcuts it can take
here, simply because there's nothing a callback can't do.  However, this
pass should go quickly:  it starts with what *was* entirely trash in cycles,
and it's probably still entirely trash in cycles.  This is maximally easy
for Python's kind of cyclic gc (it chases all and only the objects in the
dead cycles then -- it doesn't have to visit any objects outside the dead
cycles, *unless* the cycles aren't truly dead anymore).  So for sane
programs, it adds gc time proportional to the number of pointers in the dead
cycles, independent of the total number of objects.

All cyclic trash found by all gc invocations consumes a little more time
too, because we have to ask each trash object whether it has an associated
weakref callback.  In most programs, most of the time, the answer will be
"no".


From tim at zope.com  Thu Nov 13 01:44:34 2003
From: tim at zope.com (Tim Peters)
Date: Thu Nov 13 01:45:37 2003
Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHCHBAB.tim@zope.com>

The following program provokes a segfault before shutdown in a release
build, or, in a debug build, triggers

    Assertion failed: mro != NULL, file C:\Code\python\Objects\object.c,
                                   line 1225

This is on current 2.4 trunk, so includes the fix checked in on Wednesday
for "Thomas Heller's bug".

In the "it figures" department:  I was never able to provoke Jim's problem
on purpose.  I was trying to provoke a different failure here, and never got
to the point of finishing the code for that purpose.  Heh.

"""
import gc
import weakref

alist = []

class J(object):
    pass

class II(object):
    __slots__ = 'J', 'wr'

    def resurrect(self, ignore):
        alist.append(self.J)


I = II()
J.I = I
I.J = J
I.wr = weakref.ref(J, I.resurrect)

del I, J, II
gc.collect()

print alist
"""

It's trying to resolve self.J in the callback at the time it dies.  Unlike
Jim's scenario, the failure here is due to that II is an insane state (the
class containing the callback code, not some other class) -- but close
enough for me.

I doubt the __slots__ declaration is necessary, but it *is* necessary for II
to be a new-style class.  If you make II an old-style class instead, you get
a different surprise in the callback:  because tp_clear has already been
called on I too the way things work today, and old-style classes look in the
instance dict first, the attempt to reference self.J raises AttributeError.
There's no way to guess that might happen from staring at the Python code,
though (and remember that this is before shutdown!  we're all too eager to
overlook shutdown failures, but even if we weren't this one is just a result
of regular garbage collection while the interpreter and all modules are in
perfect shape).

The suggested approach in the long earlier email should repair both the
segfault and the AttributeError-out-of-thin-air surprises.  It would instead
result in J's resurrection (with J wholly intact; and I and II would also
resurrect, since J has a strong reference to I, and I to II).  The specific
invocation of gc in which this occurred wouldn't be able to collect anything
(at all, even if there were a million other objects in vanilla trash cycles
at the time -- they wouldn't get collected until a later run of gc, one that
didn't resurrect dead cycles).


From tim at zope.com  Thu Nov 13 02:17:34 2003
From: tim at zope.com (Tim Peters)
Date: Thu Nov 13 02:18:35 2003
Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHCHBAB.tim@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEHFHBAB.tim@zope.com>

[Tim]
> ...
> The suggested approach in the long earlier email should repair both
> the segfault and the AttributeError-out-of-thin-air surprises. ...
> The specific invocation of gc in which this occurred wouldn't be able
> to collect anything (at all, even if there were a million other objects
> in vanilla trash cycles at the time -- they wouldn't get collected
> until a later run of gc, one that didn't resurrect dead cycles).

Sorry, not so -- the "mini gc pass" of the same gc invocation would collect
all million of the other objects in vanilla trash cycles.  It's only weakref
callbacks sick enough to install brand new weakref callbacks on dead objects
that would prevent the other trash from getting collected in the same gc
invocation.  There wasn't anything like that in the segfaulting program.

It's also possible that we could change the weakref implementation to refuse
to allow creating new weakrefs while a weakref callback was in progress.
But that would be a new restriction; it wouldn't save gc much work (the mini
gc pass would still have to do full live-dead analysis on the leftover
trash; it would only save that pass from asking the "survivors" whether they
grew any new weakref callbacks); and reporting an exception that occurs
during gc happens by calling Py_FatalError (it's extreme).


From edloper at gradient.cis.upenn.edu  Thu Nov 13 04:01:07 2003
From: edloper at gradient.cis.upenn.edu (Edward Loper)
Date: Thu Nov 13 03:00:03 2003
Subject: [Python-Dev] add a list.stablesort() method?
Message-ID: <3FB34853.4070500@gradient.cis.upenn.edu>

Python's list.sort() method has gone through many different 
incarnations, some of which have been stable, and some of which have 
not.  As of Python 2.3, list.sort() *is* stable, but we're told not to 
rely on that behavior. [1]  In particular, it might change for future 
versions/alternate implementations of Python.

Given that, would it make sense to add a list.stablesort() method?  For 
the current implementation of cPython, it would just be another name for 
list.sort().  But adding a new name for it has two advantages:

   1. If we discover a faster sorting algorithm that's not stable,
      then future versions of Python can switch list.sort() to
      use that, but list.stablesort() will still be available for
      anyone who needs a stable sort.

   2. It explicitly marks (to the reader) which sort operations are
      relying on stability.

The main disadvantages that I can think of are:

   1. It adds a new method to the list object, which probably won't get
      used all that often (most tasks don't call for stable sorts).

   2. You can already implement a stablesort procedure in Python (albeit
      less efficiently than the c implementation). [2]

   3. If we do add a non-stable sort in the future, we'll need to
      maintain 2 separate sorting algorithms in listobj.c.

Does this seem like a reasonable addition?

-Edward


[1] <http://www.python.org/doc/current/lib/typesseq-mutable.html>
[2] <http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52234>


From greg at electricrain.com  Thu Nov 13 03:30:48 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Thu Nov 13 03:30:56 2003
Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ?
In-Reply-To: <1068641100.31989.85.camel@anthem>
References: <200311120914.08946.aleaxit@yahoo.com>
	<1068641100.31989.85.camel@anthem>
Message-ID: <20031113083048.GH26081@zot.electricrain.com>

On Wed, Nov 12, 2003 at 07:45:00AM -0500, Barry Warsaw wrote:
> On Wed, 2003-11-12 at 03:14, Alex Martelli wrote:
> 
> > I believe that setup.py is accurate, README slightly out of date,
> > Modules/Setup way out of date -- but I thought that double
> > checking couldn't possibly hurt.  So, can I confirm this to the
> > help@python.org querant, and fix the comments in README (should
> > it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial support"
> > for 3.1?) and Modules/Setup (presumably with a pointer to setup.py)?
> 
> Greg can give the definitive answer here, but my understanding is that
> the bsddb wrapper in Python 2.3 probably requires at least BerkeleyDB
> 3.3.11, supports up to 4.1.25, with the latter recommended (if it were
> up to me, at least :).  The wrapper in Python 2.3.x probably does not
> support BerkeleyDB 4.2.x.
> 
> -Barry

3.2 - 4.2 should work.  3.1 is too old and not worth the effort to get
to work properly again if its even possible.  I just removed checks
and mention of support for it in 2.4cvs.

I added the support for compiling with 4.2.x before 2.3.2 was released.
sleepycat gave me a beta 4.2; with luck they'll actually release it for
real soon.

The python 2.3.3 windows binary distribution should be compiled using
4.1.25 to maintain perfect compatibility with python 2.3-2.3.2.

-greg

From brian at sweetapp.com  Thu Nov 13 04:00:16 2003
From: brian at sweetapp.com (Brian Quinlan)
Date: Thu Nov 13 03:57:29 2003
Subject: [Python-Dev] add a list.stablesort() method?
In-Reply-To: <3FB34853.4070500@gradient.cis.upenn.edu>
Message-ID: <002e01c3a9c4$8f469e80$21795418@dell8200>

> As of Python 2.3, list.sort() *is* stable, but we're told not to 
> rely on that behavior. [1]  In particular, it might change for 
> future versions/alternate implementations of Python.

You missed Guido's pronouncement on this issue:
http://mail.python.org/pipermail/python-dev/2003-October/038773.html

The bottom line is: "OK, I pronounce on this: Python's list.sort() shall be
stable."

Cheers,
Brian


From jim at zope.com  Thu Nov 13 06:08:11 2003
From: jim at zope.com (Jim Fulton)
Date: Thu Nov 13 06:13:02 2003
Subject: [Python-Dev] Re: Provoking Jim's MRO segfault before shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHCHBAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCGEHCHBAB.tim@zope.com>
Message-ID: <3FB3661B.7050909@zope.com>

Tim Peters wrote:

...
> It's trying to resolve self.J in the callback at the time it dies.  Unlike
> Jim's scenario, the failure here is due to that II is an insane state (the
> class containing the callback code, not some other class) -- but close
> enough for me.

This is exactly like my scenario.  The class containing the callback is hosed.

In my scenario, I wasn't resuurecting anything though.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From tim at zope.com  Thu Nov 13 08:04:17 2003
From: tim at zope.com (Tim Peters)
Date: Thu Nov 13 08:04:33 2003
Subject: [Python-Dev] RE: Provoking Jim's MRO segfault before shutdown
In-Reply-To: <3FB3661B.7050909@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIIHBAB.tim@zope.com>

[Tim]
> ...
>> It's trying to resolve self.J in the callback at the time it dies.
>> Unlike Jim's scenario, the failure here is due to that II is an
>> insane state (the class containing the callback code, not some other
>> class) -- but close enough for me.

[Jim Fulton]
> This is exactly like my scenario.  The class containing the callback
> is hosed.

Ah!  I misunderstood.  Great <ehe>, then.

> In my scenario, I wasn't resuurecting anything though.

Right, I was trying to provoke a different (but related) problem.  But it
does't matter what the method is named, or what it's trying to do -- it's
dying before it gets to the part that would have resurrected something ...
for example, this segfaults too:

"""
import gc
import weakref

class J(object):
    pass

class II(object):
    def happy_happy_joy_joy(self, ignore):
        print self.bunny_rabbit


I = II()
I.unused = J
I.wr = weakref.ref(J, I.happy_happy_joy_joy)

del I, J, II
print "the sun shines"
gc.collect()
print "on all the little children"
"""

Comment out the "del" instead, and then all the little children get to enjoy
Mr. Sunshine for the few microseconds it takes to see Mr. Segfault during
shutdown instead.

Random curiousity:  note that this version doesn't set up a cycle *between*
J and I (the "J.I = J" line from the original was cut here).  It's unclear
what "purpose" J serves in this version.  Nevertheless, if "I.unsued = J" is
also removed, the segfault goes away, and it just delivers a bunny_rabbit
AttributeError instead.  As is, the strong reference from I to J nudges gc
into calling tp_clear on II before breaking cycles causes the refcount on J
to fall to 0.


From barry at python.org  Thu Nov 13 09:13:19 2003
From: barry at python.org (Barry Warsaw)
Date: Thu Nov 13 09:13:27 2003
Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ?
In-Reply-To: <20031113083048.GH26081@zot.electricrain.com>
References: <200311120914.08946.aleaxit@yahoo.com>
	<1068641100.31989.85.camel@anthem>
	<20031113083048.GH26081@zot.electricrain.com>
Message-ID: <1068732799.3723.17.camel@anthem>

On Thu, 2003-11-13 at 03:30, Gregory P. Smith wrote:

> I added the support for compiling with 4.2.x before 2.3.2 was released.
> sleepycat gave me a beta 4.2; with luck they'll actually release it for
> real soon.

Cool!  I didn't realize that.

-Barry


From barry at python.org  Thu Nov 13 09:22:40 2003
From: barry at python.org (Barry Warsaw)
Date: Thu Nov 13 09:22:53 2003
Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEHFHBAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCEEHFHBAB.tim@zope.com>
Message-ID: <1068733359.3723.19.camel@anthem>

On Thu, 2003-11-13 at 02:17, Tim Peters wrote:
> il a later run of gc, one that didn't resurrect dead cycles).
> 
> Sorry, not so -- the "mini gc pass" of the same gc invocation would collect
> all million of the other objects in vanilla trash cycles.  It's only weakref
> callbacks sick enough to install brand new weakref callbacks on dead objects
> that would prevent the other trash from getting collected in the same gc
> invocation.  There wasn't anything like that in the segfaulting program.

When Python's shutting down, will there /be/ another GC invocation?

-Barry


From tim at zope.com  Thu Nov 13 10:12:14 2003
From: tim at zope.com (Tim Peters)
Date: Thu Nov 13 10:12:35 2003
Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown
In-Reply-To: <1068733359.3723.19.camel@anthem>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEJAHBAB.tim@zope.com>

[Tim]
>> Sorry, not so -- the "mini gc pass" of the same gc invocation would
>> collect all million of the other objects in vanilla trash cycles.
>> It's only weakref callbacks sick enough to install brand new weakref
>> callbacks on dead objects that would prevent the other trash from
>> getting collected in the same gc invocation.  There wasn't anything
>> like that in the segfaulting program.

[Barry Warsaw]
> When Python's shutting down, will there /be/ another GC invocation?

New in 2.3, gc is forced twice by Py_Finalize.  But it's quite possible for
a weakref callback that itself installs new weakref callbacks to objects in
unreachable (dead) cycles, and then resurrects those dead objects, to create
a situation where no number of gc collections can suffice, not under the
proposed scheme, nor under the current scheme, nor under any scheme -- the
programmer has then set things up so that, no matter how often we try to
clean up the trash, their code keeps resurrecting part of it, then pretends
to kill it off again, etc etc.  So it's always (under any scheme) possible
to write code that will leave a weakref callback uncalled at the time Python
does its C-level exit().  But at best, I think that's pathological code.
It's not a plausible use case, except to ensure that it's not a way to crash
the interpreter.

Under the proposed scheme, there's no issue here *except* for code that
(ab)uses weakref callbacks to install new weakref callbacks in their bodies,
and attaches the callbacks objects that are unreachable from outside a dead
clump of cyclic trash containing both the object running the original
weakref callback and the object that triggered the weakref callback.

BTW, I think Python should drop its second call of garbage collection in
Py_Finalize, and *possibly* its first call too.  The second call happens
after modules have been torn down, so callbacks or __del__ methods run then
are quite likely to suffer unexpected exceptions (module globals are None,
sys.stdout no longer exists, etc).  That second call is what triggered Jim's
original segfault; was the cause of the mysterious chain of information-free
messages when the Zope3 test suite finished (before we cleaned up forgotten
daemon threads); and is the cause of similar new shutdown irritations
reported on c.l.py.

The first call in Py_Finalize suffers a different problem:  because the
global C-level "initialized" flag has been set false by the time it's
called, any Python-level code run as a result of garbage collection that
tries to load a module gets a baffling (to the user) Py_FatalError
complaining that Python isn't initialized.  I stumbled into that one by
accident while trying to reproduce Jim's problem, and that's the only report
of it I know of  So I'm not excited <wink> about that one, but a
Py_FatalError at shutdown is sure going to attract attention when somebody
else stumbles into it.


From tim at zope.com  Thu Nov 13 13:03:19 2003
From: tim at zope.com (Tim Peters)
Date: Thu Nov 13 13:03:46 2003
Subject: [Python-Dev] subtype_dealloc needs rethinking
In-Reply-To: <3FB3661B.7050909@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEJNHBAB.tim@zope.com>

We've got multiple segfault problems associated with weakref callbacks, and
multiple problems of that kind coming from subtype_dealloc alone.  Here's a
piece of test_weakref.py in my 2.4 checkout; the first part of the test got
fixed yesterday; the second part has not been checked in yet, because it
still fails (in a release build it corrupts memory and that may not be
visible; in a debug build it reliably segfaults, due to double
deallocation):

"""
    def test_sf_bug_840829(self):
        # "weakref callbacks and gc corrupt memory"
        # subtype_dealloc erroneously exposed a new-style instance
        # already in the process of getting deallocated to gc,
        # causing double-deallocation if the instance had a weakref
        # callback that triggered gc.
        # If the bug exists, there probably won't be an obvious symptom
        # in a release build.  In a debug build, a segfault will occur
        # when the second attempt to remove the instance from the "list
        # of all objects" occurs.

        import gc

        class C(object):
            pass

        c = C()
        wr = weakref.ref(c, lambda ignore: gc.collect())
        del c

        # There endeth the first part.  It gets worse.
        del wr

        c1 = C()
        c1.i = C()
        wr = weakref.ref(c1.i, lambda ignore: gc.collect())

        c2 = C()
        c2.c1 = c1
        del c1  # still alive because c2 points to it

        # Now when subtype_dealloc gets called on c2, it's not enough just
        # that c2 is immune from gc while the weakref callbacks associated
        # with c2 execute (there are none in this 2nd half of the test,
btw).
        # subtype_dealloc goes on to call the base classes' deallocs too,
        # so any gc triggered by weakref callbacks associated with anything
        # torn down by a base class dealloc can also trigger double
        # deallocation of c2.
        del c2
"""

There are two identifiable (so far) problems in subtype_dealloc (note that
these have nothing to do with Jim's current woes -- those are a different
problem with weakref callbacks, and he hasn't yet hit the problems I'm
talking about here -- but he will, eventually).

1. A weakref callback can resurrect self, but the code isn't aware of
   that now.  It's not *easy* to resurrect self, and we probably thought
   it wasn't possible, but it is:  if self is in a dead cycle, and the
   weakref callback invokes a method of an object in that cycle, self
   is visible to the callback (following the cycle links), and so self
   can get resurrected by the callback.  The callback doesn't have to
   specifically try to resurrect self, it can happen as a side effect of
   resurrecting anything in the cycle from which self is reachable.

2. Unlike other dealloc routines, subtype_delloc leaves the object,
   with refcnt 0, tracked by gc.  That's the cause of the now seemingly
   endless sequence of ways to provoke double deallocation:  when a
   weakref callback is invoked at any time while subtype_dealloc is
   executing (whether the callback is associated with self, or with
   anything that dies as a result of any base class cleanup calls), and
   if gc happens to trigger while the callback is executing, and self
   happens to be in a generation gc is collecting, then the tracked
   refcount=0 self looks like garbage to gc, so gc does

      incref
      call tp_clear
      decref

   on it, and the decref knocks the refcount back down to 0 again
   thus triggering another deallocation (while the original deallocation
   is still in progress).

To avoid #2, one of these two must be true:

A. self is untracked at the time gc happens.

B. self has a refcount > 0 at the time gc happens (e.g., the usual
   "temporarily resurrect" trick).

I checked in a 0-byte change yesterday that repaired the first half of the
test case, using #A (I simply moved the line that retracks self below the
*immediate* weakref callback).  But that same approach can't work for the
rest of subtype_dealloc, for reasons you explained in a comment at the end
of the function.

Doing something of the #B flavor appears so far to work (meaning it fixes
the rest of the test case, and hasn't triggered a new problem yet):

"""
Index: typeobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/typeobject.c,v
retrieving revision 2.251
diff -u -r2.251 typeobject.c
--- typeobject.c        12 Nov 2003 20:43:28 -0000      2.251
+++ typeobject.c        13 Nov 2003 17:57:07 -0000
@@ -667,6 +667,17 @@
                        goto endlabel;
        }

+       /* We're still not out of the woods:  anything torn down by slots
+        * or a base class dealloc may also trigger gc in a weakref
callback.
+        * For reasons explained at the end of the function, we have to
+        * keep self tracked now.  The only other way to make gc harmless
+        * is to temporarily resurrect self.  We couldn't do that before
+        * calling PyObject_ClearWeakRefs because that function raises
+        * an exception if its argument doesn't have a refcount of 0.
+        */
+       assert(self->ob_refcnt == 0);
+       self->ob_refcnt = 1;
+
        /*  Clear slots up to the nearest base with a different tp_dealloc
*/
        base = type;
        while ((basedealloc = base->tp_dealloc) == subtype_dealloc) {
@@ -693,6 +704,8 @@
                _PyObject_GC_UNTRACK(self);

        /* Call the base tp_dealloc() */
+       assert(self->ob_refcnt == 1);
+       self->ob_refcnt = 0;
        assert(basedealloc);
        basedealloc(self);
"""

I'm not sure those asserts *can't* trigger, though (well, actually, I'm sure
they can, if a weakref callback resurrects self -- but that's a different
problem), and the code is getting <heh> obscure.  Maybe that comes with the
territory.  So fresh eyeballs would help.

The problems with resurrection are related to Jim's problem, in that
tp_clear can leave behind insane objects, and those can kill us whether a
callback provokes the insanity directly (as in Jim's case), or a resurrected
insane object gets provoked sometime later.  I sketched a different scheme
for solving those in a long msg yesterday (it doesn't involve
subtype_dealloc; it involves changing gc to be much more aware of the
problems weakref callbacks can create).


From tim.one at comcast.net  Thu Nov 13 15:17:53 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Nov 13 15:17:58 2003
Subject: [Python-Dev] subtype_dealloc needs rethinking
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEJNHBAB.tim@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEKLHBAB.tim.one@comcast.net>

[Tim]
> ...
> There are two identifiable (so far) problems in subtype_dealloc

Make that one; I'm convinced the first is bogus.

> ...
> 1. A weakref callback can resurrect self, but the code isn't aware of
>    that now.

I no longer believe that's possible.  While a weakref callback can resurrect
objects in dead cycles, a weakref callback called *as a result* of anything
subtype_dealloc does cannot resurrect the object subtype_dealloc is tearing
down (because self's refcount is legitimately 0 then -- Python code can't
get to self, even if it was in a dead cycle, and the weakref callback
doesn't pass the dying object to the callback either).

That just leaves subtype_dealloc with its problem of allowing gc to believe
that self is collectible.  I'm feeling more confident about that too after
staring at the code more, but a complete fix remains strained.


From oussoren at cistron.nl  Thu Nov 13 04:59:48 2003
From: oussoren at cistron.nl (Ronald Oussoren)
Date: Thu Nov 13 15:23:10 2003
Subject: [Python-Dev] Re: More fun with Python shutdown
In-Reply-To: <200311122348.hACNmKV03619@oma.cosc.canterbury.ac.nz>
References: <200311122348.hACNmKV03619@oma.cosc.canterbury.ac.nz>
Message-ID: <1DB15604-15C0-11D8-A0F2-0003931CFE24@cistron.nl>


On 13 nov 2003, at 0:48, Greg Ewing wrote:

> Bernhard Herzog <bh@intevation.de>:
>
>> Wouldn't it be possible to call the callbacks of all weakrefs that 
>> point
>> to a cycle about to be destroyed before that destruction begins?
>
> I'm not sure that would be a good idea, for the same reasons that it
> wouldn't be a good idea to do the same for __del__ methods. Something
> might depend on them being called in the right order, or in not being
> called too soon.

But isn't the order in which they are called undefined (for cycles)? 
Another option would be to record what callbacks you will do and call 
them after completing the destruction of the cycle.

Ronald


From tim at zope.com  Thu Nov 13 17:18:19 2003
From: tim at zope.com (Tim Peters)
Date: Thu Nov 13 17:19:29 2003
Subject: [Python-Dev] subtype_dealloc needs rethinking
In-Reply-To: <20031113213711.GA12902@vicky.ecs.soton.ac.uk>
Message-ID: <LNBBLJKPBEHFEDALKOLCEELMHBAB.tim@zope.com>

[Armin Rigo]
> If all these ways involve the GC,

Jim's problem does not, but all the "subtype_dealloc vs weakref callback vs
cyclic gc" segfaults did.

> a solution that would avoid similar problems in potentially other
> deallocators might be to fix the GC instead:
>
>>       incref
>>       call tp_clear
>>       decref
>
> This is the only place where the GC explicitely changes reference
> counters.
> It could just be skipped for objects with null refcount.

I really don't like that, because gc isn't broken -- an object with refcount
0 is trash by any reasonable meaning of the word.  What I intend to do in
2.4 instead is include a new assert near the start of gc, to verify that
none of the refcounts it sees are 0 coming in.  That should never happen,
the way Python's gc works.

> As the GC is the only piece of code that should be able to handle
> objects with refcounts of zero (apart from deallocators, but we assume
> these ones know what they are doing) this would fix the double-
> deallocation issue

I agree that it would.

> without making subtype_dealloc even more hairy.

But subtype_dealloc will never be simple or clear, so if obscure cruft has
to be added, I'd rather add it there.  Adding a strange special case to gc
would spread the obscurity, but it's not a goal to make everything at least
a little obscure <wink>.  Thanks to Neil Schemenauer, the gc code today is
remarkably clean and clear.  Thanks to Guido, subtype_dealloc is about as
clear as it can be <wink>.

I just checked in another patch for the sequence of problems Thomas Heller
is seeing, and I think the final result leaves subtype_dealloc exactly as
obscure as it was in 2.3:  all this "real fix" amounts to is moving down a
line of code to near the end of the function (that's the line retracking
self with GC -- it used to do this long before it was necessary to do it,
and now it delays doing it until it's actually needed, which is beyond all
the code where it's dangerous to do it).


From guido at python.org  Fri Nov 14 11:03:39 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 14 11:13:22 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: Your message of "Fri, 14 Nov 2003 02:28:44 PST."
	<E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net> 
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net> 
Message-ID: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>

> Index: modulefinder.py
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Lib/modulefinder.py,v
> retrieving revision 1.7
> retrieving revision 1.8
> diff -C2 -d -r1.7 -r1.8
> *** modulefinder.py	18 Jul 2003 15:31:40 -0000	1.7
> --- modulefinder.py	14 Nov 2003 10:28:42 -0000	1.8
> ***************
> *** 211,215 ****
>               return
>           modules = {}
> !         suffixes = [".py", ".pyc", ".pyo"]
>           for dir in m.__path__:
>               try:
> --- 211,220 ----
>               return
>           modules = {}
> !         # 'suffixes' used to be a list hardcoded to [".py", ".pyc", ".pyo"].
> !         # But we must also collect Python extension modules - although
> !         # we cannot separate normal dlls from Python extensions.
> !         suffixes = []
> !         for triple in imp.get_suffixes():
> !             suffixes.append(triple[0])
>           for dir in m.__path__:
>               try:

Have you tested freeze after this?  I'm not sure that receiving
extension module files won't confuse it.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at python.net  Fri Nov 14 12:09:23 2003
From: theller at python.net (Thomas Heller)
Date: Fri Nov 14 12:09:35 2003
Subject: [Python-Dev] Version number in the release-maint23 branch
Message-ID: <zneyswqk.fsf@python.net>

I'd like to change the version number in the CVS release-maint23 branch
to be able to do correct version checks.

Currently it is this:

/* Version parsed out into numeric values */
#define PY_MAJOR_VERSION	2
#define PY_MINOR_VERSION	3
#define PY_MICRO_VERSION	2
#define PY_RELEASE_LEVEL	PY_RELEASE_LEVEL_FINAL
#define PY_RELEASE_SERIAL	0

/* Version as a string */
#define PY_VERSION		"2.3.2+"

Is it ok to change it to the following:

/* Version parsed out into numeric values */
#define PY_MAJOR_VERSION	2
#define PY_MINOR_VERSION	3
#define PY_MICRO_VERSION	3
#define PY_RELEASE_LEVEL	PY_RELEASE_LEVEL_ALPHA
#define PY_RELEASE_SERIAL	0

/* Version as a string */
#define PY_VERSION		"2.3.3a0"

Thomas


From mwh at python.net  Fri Nov 14 12:18:06 2003
From: mwh at python.net (Michael Hudson)
Date: Fri Nov 14 12:18:13 2003
Subject: [Python-Dev] Version number in the release-maint23 branch
In-Reply-To: <zneyswqk.fsf@python.net> (Thomas Heller's message of "Fri, 14
	Nov 2003 18:09:23 +0100")
References: <zneyswqk.fsf@python.net>
Message-ID: <2mislmrhrl.fsf@starship.python.net>

Thomas Heller <theller@python.net> writes:

> Is it ok to change it to the following:

Yes.

Cheers,
mwh

:-)

-- 
  MARVIN:  Do you want me to sit in a corner and rust, or just fall
           apart where I'm standing?
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 2

From fdrake at acm.org  Fri Nov 14 12:18:43 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Nov 14 12:19:22 2003
Subject: [Python-Dev] Version number in the release-maint23 branch
In-Reply-To: <zneyswqk.fsf@python.net>
References: <zneyswqk.fsf@python.net>
Message-ID: <16309.3699.852855.738740@grendel.zope.com>


Thomas Heller writes:
 > I'd like to change the version number in the CVS release-maint23 branch
 > to be able to do correct version checks.
...
 > Is it ok to change it to the following:

Yes.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From theller at python.net  Fri Nov 14 11:59:36 2003
From: theller at python.net (Thomas Heller)
Date: Fri Nov 14 12:27:14 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> (Guido
	van Rossum's message of "Fri, 14 Nov 2003 08:03:39 -0800")
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
Message-ID: <ad6yubrb.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> Have you tested freeze after this?  I'm not sure that receiving
> extension module files won't confuse it.

>From what I remember, freeze has never 'worked' for me on windows -
maybe I didn't try hard enough.

Apart from that, modulefinder also finds extension modules in other
ways, so I would guess freeze must be able to handle them.

So, I would like to leave testing freeze to people and on platforms were
it actually is used.

If this means that this change must be backed out again in the 2.3
branch, so be it.

Thomas


From pp64 at cornell.edu  Fri Nov 14 12:29:02 2003
From: pp64 at cornell.edu (Pavel Pergamenshchik)
Date: Fri Nov 14 12:29:10 2003
Subject: [Python-Dev] Getting socket information from socket objects
Message-ID: <20031114122902.2a08b3e3.pp64@cornell.edu>

Hi.
It appears that the easiest way to retrieve family/type/protocol fields from socket objects is this:
def getsockinfo(sock):
    s = `sock._sock`
    sp = s[1:-1].split(",")[1:]
    g = {}
    d = {}
    for i in sp:
        exec i.strip() in g, d
    return (d["family"], d["type"], d["protocol"])
Wouldn't it be nice to have accessors for these fields? My particular use-case is Windows-specific (IO completion port proactor), so winsock API provides this, but I'd rather avoid that crud.
Also, exporting getsockaddrarg in socketmodule.c CAPI would be useful, although the only use I can think of is implementing Windows' ConnectEx (which I am doing)

From skip at pobox.com  Fri Nov 14 12:49:46 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Nov 14 12:50:06 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: <ad6yubrb.fsf@python.net>
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net>
Message-ID: <16309.5562.644105.6880@montanaro.dyndns.org>


    >> Have you tested freeze after this?  I'm not sure that receiving
    >> extension module files won't confuse it.

    Thomas> From what I remember, freeze has never 'worked' for me on
    Thomas> windows - maybe I didn't try hard enough.

Maybe freeze should be deprecated in 2.4.  There are other third-party
packages (Gordon McMillan's installer and Thomas's py2exe) which do a better
job anyway.  Does either one use freeze under the covers?

Skip

From theller at python.net  Fri Nov 14 13:13:17 2003
From: theller at python.net (Thomas Heller)
Date: Fri Nov 14 13:13:32 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: <16309.5562.644105.6880@montanaro.dyndns.org> (Skip Montanaro's
	message of "Fri, 14 Nov 2003 11:49:46 -0600")
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net>
	<16309.5562.644105.6880@montanaro.dyndns.org>
Message-ID: <islmsts2.fsf@python.net>

Skip Montanaro <skip@pobox.com> writes:

>     >> Have you tested freeze after this?  I'm not sure that receiving
>     >> extension module files won't confuse it.
>
>     Thomas> From what I remember, freeze has never 'worked' for me on
>     Thomas> windows - maybe I didn't try hard enough.
>
> Maybe freeze should be deprecated in 2.4.  There are other third-party
> packages (Gordon McMillan's installer and Thomas's py2exe) which do a better
> job anyway.  Does either one use freeze under the covers?

Not that I know of (although I'm not sure how installer does it under
*nix).

But freeze has two advantages (from reading the sources):
- it should be able to work everwhere were a C compiler is available
- it is able to create true, single file executables.

Thomas


From guido at python.org  Fri Nov 14 13:14:01 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 14 13:14:15 2003
Subject: [Python-Dev] Getting socket information from socket objects
In-Reply-To: Your message of "Fri, 14 Nov 2003 12:29:02 EST."
	<20031114122902.2a08b3e3.pp64@cornell.edu> 
References: <20031114122902.2a08b3e3.pp64@cornell.edu> 
Message-ID: <200311141814.hAEIE1d05005@12-236-54-216.client.attbi.com>

> It appears that the easiest way to retrieve family/type/protocol
> fields from socket objects is this:

> def getsockinfo(sock):
>     s = `sock._sock`
>     sp = s[1:-1].split(",")[1:]
>     g = {}
>     d = {}
>     for i in sp:
>         exec i.strip() in g, d
>     return (d["family"], d["type"], d["protocol"])

> Wouldn't it be nice to have accessors for these fields? My
> particular use-case is Windows-specific (IO completion port
> proactor), so winsock API provides this, but I'd rather avoid that
> crud.

Sounds like a good idea.  Upload your patches to SF!

> Also, exporting getsockaddrarg in socketmodule.c CAPI would be
> useful, although the only use I can think of is implementing
> Windows' ConnectEx (which I am doing)

I'm unclear on what you propose here; again, a working patch on SF
showing what you propose would help.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Nov 14 13:15:02 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 14 13:15:09 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: Your message of "Fri, 14 Nov 2003 11:49:46 CST."
	<16309.5562.644105.6880@montanaro.dyndns.org> 
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net> 
	<16309.5562.644105.6880@montanaro.dyndns.org> 
Message-ID: <200311141815.hAEIF2j05017@12-236-54-216.client.attbi.com>

> Maybe freeze should be deprecated in 2.4.

That might be a good idea.

> There are other third-party packages (Gordon McMillan's installer
> and Thomas's py2exe) which do a better job anyway.  Does either one
> use freeze under the covers?

No.  (Though py2exe uses modulefinder, which is why that's in Lib
rather than in Tools/freeze. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Nov 14 13:18:30 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 14 13:18:38 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: Your message of "Fri, 14 Nov 2003 19:13:17 +0100."
	<islmsts2.fsf@python.net> 
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net>
	<16309.5562.644105.6880@montanaro.dyndns.org> 
	<islmsts2.fsf@python.net> 
Message-ID: <200311141818.hAEIIU305052@12-236-54-216.client.attbi.com>

> But freeze has two advantages (from reading the sources):
> - it should be able to work everwhere were a C compiler is available

Well, it also uses Make, although I suppose you could easily change it
to create a script for some other build tool, as long as it's scriptable.

> - it is able to create true, single file executables.

Not on Windows unless you have a static build of Python.  And not on
Unix either unless you have static builds of all extension modules.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Fri Nov 14 19:46:59 2003
From: python at rcn.com (Raymond Hettinger)
Date: Fri Nov 14 19:48:04 2003
Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate
In-Reply-To: <20031113225056.GA11305@vicky.ecs.soton.ac.uk>
Message-ID: <004601c3ab11$fa73e980$5204a044@oemcomputer>

[Armin Rigo]
> from heapq import *
> def isorted(iterable):
>     heap = list(iterable)
>     heapify(heap)
>     while heap:
>         yield heappop(heap)
> 
> This generator is similar to the new list.sorted() but starts yielding
> elements after only O(n) operations (in heapify).  Certainly not a
> candidate
> for itertools, but it could be added to heapqmodule.c.  There are
numerous
> cases where this kind of lazy-sorting is interesting, if done
reasonably
> efficiently (unsurprizingly, this is known as Heap Sort).

How much of the iterator can be consumed before it becomes preferable
(in terms of speed and memory) to have used iter(list.sort())?

My guess is that the break-even point for speed is around 10% depending
on how much order already exists in the underlying list.  In terms of
memory, I think list.sort() always beats the above implementation.


Raymond Hettinger


From tim at zope.com  Fri Nov 14 23:17:45 2003
From: tim at zope.com (Tim Peters)
Date: Fri Nov 14 23:18:02 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHCHBAB.tim@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEDMHCAB.tim@zope.com>

I think I have a reasonably elegant scheme to nail this.  It's best
described as modifications to what cyclic gc already does.  So here's a
summary, with the new steps identified by [NEW]:

Cyclic gc first finds the maximal subset S of the objects in the current
generation such that no object in S is reachable from outside of S.  S is
the (possibly empty) set of cyclic trash in the current generation.

Next partition S into 5 ([NEW] -- is 3 today) disjoint sets:

1. Objects with __del__ methods.

2. Objects not in #1 reachable from an object in #1.

3. [NEW] Objects not in #1 or #2 with an associated weakref callback.

4. [NEW] Objects not in #1, #2 or #3 reachable from an object in #3.

5. Objects not in one of the other sets.

Then:

A. Call tp_clear on each object in set 5 (set 5 may mutate while
   this is going on, so that needs some care).  If an object's
   refcount isn't 0 after calling its tp_clear, move it to the
   next older generation (that doesn't preclude that a later tp_clear
   in this step may reclaim it).

B. [NEW] Invoke the callbacks associated with the objects still in
   set 3.  This also needs some care, as the deallocations occurring
   in step #A may remove objects from set 3, or even just remove
   the weak references to them so that the objects in set 3 are
   still there, but no longer have an associated callback.  I expect
   we'd have to contrive code to make that happen, but we have to
   be safe against every possibility.  The callbacks invoked during
   this step may also remove callbacks from objects in set 3 we
   haven't yet gotten to, or even add new callbacks to objects in
   sets 1 through 4.

C. [NEW] Move the objects still remaining in sets 3 and 4 to the
   youngest generation.

D. Move the objects still remaining in set 1 to gc.garbage.

E. Move the objects still remaining in set 2 to the next (older)
   generation.


That's telegraphic, and is bursting with subtleties.  Here are notes on the
new subtleties:

+ A key observation is that running weakref callbacks on the objects
  in set 3 can't have any effect on the objects in set 5, nor can the
  states of the objects in set 5 affect what a callback may want to do.
  This is so because no object in set 5 is reachable from an object
  in set 3:  a callback can neither consult nor alter a set 5 object.
  So clearing set 5 first (in step A) is harmless, and should allow
  most cyclic trash in most programs to get collected ASAP.

+ Clearing the objects in set 5 first is desirable also because
  doing so may break enough links that objects in sets 1 thru 4
  get deallocated naturally (meaning via the usual refcount-falls-to-0
  route).  Note that it's quite possible that objects in sets 1 thru
  4 are reachable from objects in set 5 -- it's the other direction
  where reachability can't hold (by construction of the partition,
  not by luck).

+ By the start of B, tp_clear hasn't been called on anything reachable
  from sets 3 or 4, so the callbacks "see" wholly intact objects.
  Nothing visible to the callbacks has been torn down:  __dicts__
  are still fully populated, __mro__ slots are still as they were,
  etc.  Step B doesn't do any tp_clear itself either, so the only
  mutations that occur are those performed by the callbacks.  If
  a callback destroys a piece of state some other callback wanted,
  that's entirely on the user's head.

+ Because a weakref callback destroys itself after it's called, in
  non-pathological programs no object in set 3 or 4 will have a
  weakref callback associated with it at the end of step B.  We
  cannot go on to call tp_clear on these objects, because the instant
  the first callback returns, we have no idea anymore which of these
  objects are still part of cyclic trash (the callbacks can resurrect
  any or all of them, ditto add new callbacks to any/all).

  Determining whether they are still trash requires doing live/dead
  analysis over from scratch.  Simply moving them into *some*
  generation ensures that they'll get analyzed again on a future run
  of cyclic gc.  Moving them into the youngest generation is done
  because they almost certainly are (in almost all programs, almost
  all of the time) still cyclic trash, and without new weakref
  callbacks.  Putting them in the youngest generation allows them
  to get reclaimed on the next gc invocation.

  In steady state for a sane program creating a sustained stream of
  cyclic trash with associated weakref callbacks, this delays their
  collection by one gc invocation:  the reclamation throughput should
  equal the rate of trash creation, but there's a one-invocation
  reclamation latency introduced at the start.  There's no new latency
  in invoking the callbacks.

+ Because we still won't collect cyclic trash with __del__ methods,
  or cyclic trash reachable from such trash, we do the partitioning
  in such a way that weakref callbacks on such trash don't get called
  at all -- we're not even going to try to reclaim them, so it may be
  surprising if their callbacks get invoked.  OTOH, it may be desired
  that their callbacks get invoked despite that gc will never try
  to reclaim them on its own.  Tough luck.  The callbacks will
  get invoked if and when the user breaks enough cycles in gc.garbage
  to avoid running afoul of the __del__ restriction.

Objections?  Great <wink> objections are of two kinds:  (1) it won't work;
and (2) it can't be sold for a bugfix release.  Note that 2.3.2 is
segfaulting today, so *something* has to be done for a bugfix release.  I
don't believe this scheme alters any defined semantics, and to the contrary
makes it possible to say for the first time that objects visible to
callbacks are never in mysteriously (and undefinedly so) partly-destroyed
states.  Objecting that the order of callback invocation isn't defined
doesn't hold, because the order isn't defined in 2.3.2 either.  Tempting as
it may be, a scheme that refused to collect cyclic trash with associated
weakref callbacks would be an incompatible change; Jim also has a use case
for that (a billion lines of Zope3 <wink>).


From tim.one at comcast.net  Sat Nov 15 03:26:52 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 15 03:27:04 2003
Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate
In-Reply-To: <004601c3ab11$fa73e980$5204a044@oemcomputer>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEEGHCAB.tim.one@comcast.net>

[Armin Rigo]
>> from heapq import *
>> def isorted(iterable):
>>     heap = list(iterable)
>>     heapify(heap)
>>     while heap:
>>         yield heappop(heap)
>>
>> This generator is similar to the new list.sorted() but starts
>> yielding elements after only O(n) operations (in heapify).
>> ...

[Raymond Hettinger]
> How much of the iterator can be consumed before it becomes preferable
> (in terms of speed and memory) to have used iter(list.sort())?
>
> My guess is that the break-even point for speed is around 10%
> depending on how much order already exists in the underlying list.

This depends so much on the speed of the heap implementation.  When it gets
into the log-time part, a high multiplicative constant due to fixed
overheads makes a slow heap run like a fast heap would if the latter were
working on an *exponentially* larger list.

I just tried on my laptop, under 2.3.2, with lists of a million random
floats.  That's a bad case for list.sort() (there's no order to exploit, and
it wastes some compares trying to find order to exploit), and is an average
case for a heapsort.  Even if I only asked for *just* the first element of
the sorted result, using sort() and peeling off the first element was about
25% faster than using heapify followed by one heappop.

That says something about how dramatic the overheads are in calling
Python-coded heap functions (well, it also says something about the amount
of effort I put into optimizing list.sort() <wink>).

There are deeper problems the heap approach has to fight:

1. A heapsort does substantially more element compares than a
   mergesort, and element compares are expensive in Python, so
   that's hard to overcome.

2. Heapsort has terrible spatial locality, and blowing the cache
   becomes even more important than comparison speed as the number
   of elements grows large.

   One of the experiments I did when writing the 2.3 sort was to
   compare a straight mergesort to an enhanced version of "weak-
   heap sort".  Both of those do close to the theoretical minimum
   number of compares on random data.  Despite that the mergesort
   moved more memory around, the always-sequential data access in
   the mergesort left it much faster than the cache-hostile weak-
   heap sort.  A regular heapsort isn't as cache-hostile as a
   weak-heap sort, but it's solidly on the cache-hostile side of
   sorting algorithms, and does more compares too.

There's another way to get an iterative sort:  do an ordinary recursive
top-down mergesort, but instead of shuffling sublists in place, *generate*
the merge of the subsequences (which are themselves generators, etc).
That's a very elegant sort, with the remarkable property that the first
element of the final result is generated after doing exactly N-1 compares,
which achieves the theoretical minimum for finding the smallest element.
Getting result elements after that takes O(log(N)) additional compares each.
No array storage is needed beyond the original input list (which isn't
changed), but there are O(N) generators hiding in the runtime stack.  Alas,
for that reason it's impractical for large lists, and the overheads are
deadly for short lists.  It does enjoy the advantage of beauty <wink>.

> In terms of memory, I think list.sort() always beats the above
> implementation.

That can't be -- the heap method only requires a fixed (independent of N)
and small amount of working storage.  list.sort() may need to allocate O(N)
additional temp bytes under the covers (to create a working area for doing
merges; it can be expected to allocate 2*N temp bytes for a random array of
len N, which is its worst case; if there's a lot of pre-existing order in
the input array, it can sometimes get away without allocating any temp
space).


From arigo at tunes.org  Sat Nov 15 06:38:17 2003
From: arigo at tunes.org (Armin Rigo)
Date: Sat Nov 15 06:42:08 2003
Subject: [Python-Dev] Small bug -- direct check-in allowed?
Message-ID: <20031115113817.GA16190@vicky.ecs.soton.ac.uk>

Hello,

Just asking because I'm not sure about this rule: is it ok if I just make a
check-in without first posting a SF bug or patch report for small bugs with an
obvious solution ?

In this case:

>>> import heapq
>>> heapq.heappop(5)
Segmentation fault


Armin


From python at rcn.com  Sat Nov 15 07:26:47 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 15 07:27:10 2003
Subject: [Python-Dev] Small bug -- direct check-in allowed?
In-Reply-To: <20031115113817.GA16190@vicky.ecs.soton.ac.uk>
Message-ID: <002001c3ab73$bd3a2900$183ac797@oemcomputer>

> Just asking because I'm not sure about this rule: is it ok if I just
make
> a
> check-in without first posting a SF bug or patch report for small bugs
> with an
> obvious solution ?

Just fix it.


Raymond


From tim.one at comcast.net  Sat Nov 15 07:32:29 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 15 07:32:35 2003
Subject: [Python-Dev] Small bug -- direct check-in allowed?
In-Reply-To: <20031115113817.GA16190@vicky.ecs.soton.ac.uk>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEFCHCAB.tim.one@comcast.net>

[Armin Rigo]
> Just asking because I'm not sure about this rule: is it ok if I just
> make a check-in without first posting a SF bug or patch report for
> small bugs with an obvious solution ?

Even large bugs.  The question is much more whether it's likely that the
change will be controversial.  If you're an expert in an area, and want to
fix what's obviously a bug, without introducing another bug in the process,
and in a way that's obviously an improvement, it's not going to be
controversial, and everyone saves time and effort if you just do it.  Some
of those "obviously"s may be obvious only *to* an expert in the area, but
that's OK too -- the non-experts in the area wouldn't follow a report or
discussion anyway.

> In this case:
>
> >>> import heapq
> >>> heapq.heappop(5)
> Segmentation fault

It depends on what you do.  If, for example, you created a new standard
SegfaultError exception, and used a platform-specific memory protection
gimmick to raise that instead on your box but not others, you could
reasonably expect that to be a controversial change on at least two counts.
Then you should bring it up for discussion before doing it.

If instead you want to say that, in this context, an integer N should act
the same way as range(N) would have acted, and have heappop return 0, then
you'd be judged insane if you checked that in, and I'd probably revoke your
checkin privileges for your own good <wink>.

If you want to raise TypeError in this case, great, just do it.


From tim.one at comcast.net  Sat Nov 15 07:41:44 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 15 07:41:49 2003
Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Modules
	heapqmodule.c, 1.1, 1.2
In-Reply-To: <E1AKzbs-0004sH-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEFEHCAB.tim.one@comcast.net>

> Modified Files:
> 	heapqmodule.c
> Log Message:
> Verify heappop argument is a list.
>
> Index: heapqmodule.c
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Modules/heapqmodule.c,v
> retrieving revision 1.1
> retrieving revision 1.2
> diff -C2 -d -r1.1 -r1.2
> *** heapqmodule.c	8 Nov 2003 10:24:38 -0000	1.1
> --- heapqmodule.c	15 Nov 2003 12:33:01 -0000	1.2
> ***************
> *** 120,123 ****
> --- 120,128 ----
>   	int n;
>
> + 	if (!PyList_Check(heap)) {
> + 		PyErr_SetString(PyExc_ValueError, "heap argument must be a list");
> + 		return NULL;
> + 	}

Now *that's* controversial:  the complaint is about the type of the argument
so should raise TypeError instead.  Curiously, the Python version of this
module raised a pretty mysterious AttributeError.


From python at rcn.com  Sat Nov 15 08:00:48 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 15 08:01:12 2003
Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEEGHCAB.tim.one@comcast.net>
Message-ID: <002301c3ab78$7da69440$183ac797@oemcomputer>

> [Armin Rigo]
> >> from heapq import *
> >> def isorted(iterable):
> >>     heap = list(iterable)
> >>     heapify(heap)
> >>     while heap:
> >>         yield heappop(heap)
> >>
> > In terms of memory, I think list.sort() always beats the above
> > implementation.
> 
> That can't be -- the heap method only requires a fixed (independent of
N)
> and small amount of working storage.  list.sort() may need to allocate
> O(N)
> additional temp bytes under the covers (to create a working area for
doing
> merges; it can be expected to allocate 2*N temp bytes for a random
array
> of
> len N, which is its worst case; if there's a lot of pre-existing order
in
> the input array, it can sometimes get away without allocating any temp
> space).

The isorted() generator shown above operates on a copy of the data while
list.sort() works in-place.  So, my take on it was the isorted() always
used 2*N while list.sort() used 2*N only in the worst case.


Raymond


#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################

From python at rcn.com  Sat Nov 15 08:32:38 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 15 08:33:01 2003
Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate
In-Reply-To: <20031115123758.GB26321@vicky.ecs.soton.ac.uk>
Message-ID: <002701c3ab7c$f068cbc0$183ac797@oemcomputer>

> Getting the 25 smallest elements:
> 
> min_and_remove_repeatedly(lst, 25)               7.4
> list(itertools.islice(heapsort(lst), 25))        1.05
> list(itertools.islice(isorted(lst), 25))         1.03
> list.sorted(lst)[:25]                            6.65
> 
> Getting all elements:
> 
> list(heapsort(lst))                              22.49
> list(isorted(lst))                               26.06
> list.sorted(lst)                                 6.65

Can you find out at what value of N does the time for the heap approach
match the time for the list.sorted() approach.  I'm interested to see
how close it comes to my original 10% estimate.


> While heapsort is not much faster than the Python-coded isorted using
the
> C heappop, if there is interest I can submit it to SF.

Without a much larger speed-up I would recommend against it.  This is
doubly true for the cases where N==1 or N > len(lst)//10 which are
dominated by min() or list.sorted().  Why add a function that is usually
the wrong way to do it.

The situation is further unbalanced against the heap approach when the
problem becomes "get the 25 largest" or for cases where the record
comparison costs are more expensive.


Raymond


#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################

#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################

From guido at python.org  Sat Nov 15 10:43:55 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 15 10:44:03 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules
	heapqmodule.c, 1.1, 1.2
In-Reply-To: Your message of "Sat, 15 Nov 2003 04:33:04 PST."
	<E1AKzbs-0004sH-00@sc8-pr-cvs1.sourceforge.net> 
References: <E1AKzbs-0004sH-00@sc8-pr-cvs1.sourceforge.net> 
Message-ID: <200311151543.hAFFhtv13945@12-236-54-216.client.attbi.com>

> + 	if (!PyList_Check(heap)) {
> + 		PyErr_SetString(PyExc_ValueError, "heap argument must be a list");
> + 		return NULL;
> + 	}
> + 

As Tim suggested, this should be a TypeError.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From fincher.8 at osu.edu  Sat Nov 15 12:23:02 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Sat Nov 15 11:25:10 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules
	heapqmodule.c, 1.1, 1.2
In-Reply-To: <200311151543.hAFFhtv13945@12-236-54-216.client.attbi.com>
References: <E1AKzbs-0004sH-00@sc8-pr-cvs1.sourceforge.net>
	<200311151543.hAFFhtv13945@12-236-54-216.client.attbi.com>
Message-ID: <200311151223.02746.fincher.8@osu.edu>

On Saturday 15 November 2003 10:43 am, Guido van Rossum wrote:
> > + 	if (!PyList_Check(heap)) {
> > + 		PyErr_SetString(PyExc_ValueError, "heap argument must be a list");
> > + 		return NULL;
> > + 	}
> > +
>
> As Tim suggested, this should be a TypeError.

If only lists are allowed, wouldn't we be better off with a better interface 
than the current one?  I thought the point of the current interface was that 
we could use containers other than lists as long as they defined pop and 
append methods.

Jeremy

From anthony at interlink.com.au  Sat Nov 15 11:46:06 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sat Nov 15 11:47:06 2003
Subject: [Python-Dev] Version number in the release-maint23 branch 
In-Reply-To: <zneyswqk.fsf@python.net> 
Message-ID: <200311151646.hAFGk6V2012644@localhost.localdomain>


>>> Thomas Heller wrote
> I'd like to change the version number in the CVS release-maint23 branch
> to be able to do correct version checks.
> [ switch from e.g. 2.3.2+ to 2.3.3a0 straight after release of 2.3.2 ]

Should we make this official? In that case, after a major release, should
the version go from, e.g. 2.4b1 -> 2.4b2 -> 2.4c1 -> 2.4 -> 2.4.1a0 ? Or 
should that only happen on the maint branch, and the trunk would go to 
2.5a0?

Consistently-hobgoblinish,
Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Sat Nov 15 11:51:28 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sat Nov 15 11:52:10 2003
Subject: [Python-Dev] Small bug -- direct check-in allowed? 
In-Reply-To: <20031115113817.GA16190@vicky.ecs.soton.ac.uk> 
Message-ID: <200311151651.hAFGpSFH012700@localhost.localdomain>


>>> Armin Rigo wrote
> Hello,
> 
> Just asking because I'm not sure about this rule: is it ok if I just make a
> check-in without first posting a SF bug or patch report for small bugs with an
> obvious solution ?

One twist to this - as someone who does release management, I'd prefer that
if the bug has been in a released version of Python, it has a bug # that 
can be referenced in the NEWS file for a release. If, as in this case, it's 
in stuff that's never been released (I assume the bug is in Raymond's new 
C-code heapq module), I don't particularly care. 

If others agree with this, perhaps it should go in the developer docs on 
the website...

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From raymond.hettinger at verizon.net  Sat Nov 15 13:19:39 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Sat Nov 15 13:20:56 2003
Subject: [Python-Dev] set() and frozenset()
Message-ID: <001401c3aba5$28a8a3c0$183ac797@oemcomputer>

The C implementation of sets is now ready for your experimentation and
comment.  If fulfills most of Greg's proposal at:
http://www.python.org/peps/pep-0218.html

The files are at: nondist\sandbox\setobj 
Build it with:  python setup.py build -g install 
Test it with:  python_d test_set.py


The differences from sets.py are:

. The containers are now named set() and frozenset().  The lowercase
convention is used to make the names consistent with other builtins like
list() and dict().  User feedback said that the name ImmutableSet was
unwieldy (?sp), so frozenset() was chosen as a more succinct
alternative.

. There is no set.update() method because that duplicated
set.union_update().

. There is no automatic conversion from the mutable type to the
non-mutable type.  User feedback revealed that this was never needed.
Also, removing it simplified the code considerably.  The result is more
straight-forward and a lot less magical.  David Eppstein provided code
examples demonstrating that set() and frozenset() are just right for
implementing common graph algorithms and NFA/DFAs.  

. The __deepcopy__() method will be implemented in copy.py instead of
setmodule.c.  This is consistent with other builtin containers and keeps
all the deepcopying knowledge in one place.  Also, the code is much
simpler in pure python and I wanted avoid importing the copy module
inside setobject.c.

. The __getstate__() and __setstate__() methods were replaced by
__reduce__().  Pickle sizes were made much smaller by saving just the
keys instead of key:True pairs.

. There is no equivalent of BaseSet.  This saves adding another builtin
and it is not a burden to write isinstance(s, (set, frozenset)).


The difference from PEP 218 is:

. There is not a special syntax for constructing sets.  Once generator
expressions are implemented, special notations become superfluous.  It
is simple enough to write:  s = set(iterable).


Though the implementation is basically done and ready for you guys to
experiment with, I still have a few open items:

. Expand the unittests to include all of the applicable tests from the
existing test_sets.py

. Refactor the error exits to use goto and XDECREF.

. Do one more detailed (line-by-line review).

. Write the docs.

. Recast the extension module to be a builtin object.
 
. Note, the original sets.py will be left unchanged so that code written
for it will continue to run without modification.  For those interested
in speed and pickle size, it is simple enough to search-and-replace
"sets.Set" with "set".


Raymond Hettinger


From python at rcn.com  Sat Nov 15 14:02:01 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 15 14:02:29 2003
Subject: [Python-Dev] Re: [Python-checkins]
	python/dist/src/Modulesheapqmodule.c, 1.1, 1.2
In-Reply-To: <200311151223.02746.fincher.8@osu.edu>
Message-ID: <001901c3abaa$f4058ba0$183ac797@oemcomputer>

> I thought the point of the current interface was
> that
> we could use containers other than lists as long as they defined pop
and
> append methods.

It would need __len__(), __getitem__(), __setitem__(), append(), and
pop().  Right now, any list of subclass of list will do.  That helps the
current implementation run faster.

I think polymorphism is more important for the contents of the container
rather than the container itself.  The objects inside the container need
only define __le__().


Raymond


From python at rcn.com  Sat Nov 15 14:48:45 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 15 14:49:09 2003
Subject: [Python-Dev] set() and frozenset() addenda
In-Reply-To: <001401c3aba5$28a8a3c0$183ac797@oemcomputer>
Message-ID: <001e01c3abb1$7b4534c0$183ac797@oemcomputer>

[My previous note]
> The differences from sets.py are:

Also, there is no _repr(sorted=True) method.  That need is already met
by list.sorted(s).


Raymond


From barry at python.org  Sat Nov 15 15:09:08 2003
From: barry at python.org (Barry Warsaw)
Date: Sat Nov 15 15:09:14 2003
Subject: [Python-Dev] Version number in the release-maint23 branch
In-Reply-To: <200311151646.hAFGk6V2012644@localhost.localdomain>
References: <200311151646.hAFGk6V2012644@localhost.localdomain>
Message-ID: <1068926947.990.99.camel@anthem>

On Sat, 2003-11-15 at 11:46, Anthony Baxter wrote:
> >>> Thomas Heller wrote
> > I'd like to change the version number in the CVS release-maint23 branch
> > to be able to do correct version checks.
> > [ switch from e.g. 2.3.2+ to 2.3.3a0 straight after release of 2.3.2 ]
> 
> Should we make this official? In that case, after a major release, should
> the version go from, e.g. 2.4b1 -> 2.4b2 -> 2.4c1 -> 2.4 -> 2.4.1a0 ? Or 
> should that only happen on the maint branch, and the trunk would go to 
> 2.5a0?

Yes, I think so.  After a release, branch to 2.x.1a0 and trunk to
2.x+1a0

-Barry


From guido at python.org  Sat Nov 15 16:56:39 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 15 16:56:52 2003
Subject: [Python-Dev] Re: [Python-checkins]
	python/dist/src/Modulesheapqmodule.c, 1.1, 1.2
In-Reply-To: Your message of "Sat, 15 Nov 2003 14:02:01 EST."
	<001901c3abaa$f4058ba0$183ac797@oemcomputer> 
References: <001901c3abaa$f4058ba0$183ac797@oemcomputer> 
Message-ID: <200311152156.hAFLud014391@12-236-54-216.client.attbi.com>

> > I thought the point of the current interface was that we could use
> > containers other than lists as long as they defined pop and append
> > methods.
> 
> It would need __len__(), __getitem__(), __setitem__(), append(), and
> pop().  Right now, any list of subclass of list will do.  That helps the
> current implementation run faster.
> 
> I think polymorphism is more important for the contents of the container
> rather than the container itself.  The objects inside the container need
> only define __le__().

Well, of course.  There *is* the theoretical objection that the old
heapq.py would work with any mutable sequence supporting append() and
pop() -- but I expect that is indeed purely a theoretical objection.

When I first introduced heapq.py, I briefly considered making it a
list subclass, but it didn't seem worth it (especially since the class
version would likely be slower).  But maybe for the C implementation
this makes more sense, especially since it only allows lists or list
subclasses anyway...?

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sat Nov 15 17:28:46 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov 15 17:29:11 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: <16309.5562.644105.6880@montanaro.dyndns.org>
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net>
	<16309.5562.644105.6880@montanaro.dyndns.org>
Message-ID: <m3he15tgf5.fsf@mira.informatik.hu-berlin.de>

Skip Montanaro <skip@pobox.com> writes:

> Maybe freeze should be deprecated in 2.4.  There are other third-party
> packages (Gordon McMillan's installer and Thomas's py2exe) which do a better
> job anyway.

I very much question that these other packages are "better", in
all possible respects. In terms of usability for the developer,
perhaps, but not in terms of quality of the resulting binary.

So please keep freeze.

Regards,
Martin

From martin at v.loewis.de  Sat Nov 15 17:30:13 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov 15 17:31:04 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: <200311141818.hAEIIU305052@12-236-54-216.client.attbi.com>
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net>
	<16309.5562.644105.6880@montanaro.dyndns.org>
	<islmsts2.fsf@python.net>
	<200311141818.hAEIIU305052@12-236-54-216.client.attbi.com>
Message-ID: <m3d6bttgcq.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> > - it is able to create true, single file executables.
> 
> Not on Windows unless you have a static build of Python.  And not on
> Unix either unless you have static builds of all extension modules.

Anybody using freeze should be able to arrange that these conditions
are met. It is even possible to freeze Tcl into the resulting binary.

Regards,
Martin

From magnus at hetland.org  Sat Nov 15 18:18:03 2003
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Sat Nov 15 18:18:22 2003
Subject: [Python-Dev] Re: set() and frozenset()
Message-ID: <20031115231803.GA21142@idi.ntnu.no>

Great to see that these two will be in place soon!

-- 
Magnus Lie Hetland                "In this house we obey the laws of
http://hetland.org                 thermodynamics!"    Homer Simpson

From tim.one at comcast.net  Sat Nov 15 18:32:30 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 15 18:32:27 2003
Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate
In-Reply-To: <002301c3ab78$7da69440$183ac797@oemcomputer>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEHFHCAB.tim.one@comcast.net>

[Armin Rigo]
>>>> from heapq import *
>>>> def isorted(iterable):
>>>>     heap = list(iterable)
>>>>     heapify(heap)
>>>>     while heap:
>>>>         yield heappop(heap)

[Raymond Hettinger]
>>> In terms of memory, I think list.sort() always beats the above
>>> implementation.

[Tim]
>> That can't be -- the heap method only requires a fixed (independent
>> of N) and small amount of working storage.  list.sort() may need to
>> allocate O(N) additional temp bytes under the covers (to create a
>> working area for doing merges; it can be expected to allocate 2*N
>> temp bytes for a random array of len N, which is its worst case; if
>> there's a lot of pre-existing order in the input array, it can
>> sometimes get away without allocating any temp space).

[Raymond]
> The isorted() generator shown above operates on a copy of the data
> while list.sort() works in-place.  So, my take on it was the
> isorted() always used 2*N while list.sort() used 2*N only in the
> worst case.

Ah.  But that's comparing apples and donkeys:  Armin's example works on any
iterable, while list.sort() only works on lists.  I assumed that by
"list.sort()" you meant "the obvious method *based* on list.sort() also
accepting any iterable", i.e.,

    def isorted(iterable):
        copy = list(iterable)
        copy.sort()
        for x in copy:
            yield x

Then it's got all the space overhead of the list copy in Armin's version,
plus the additional hidden temp memory allocated by sort.

Something to note:  most applications that only want the "first N" or "last
N" values in sorted order know N in advance, and that's highly exploitable.
David Eppstein and I had a long thread about that here a while back.  The
example of implementing an "N-best queue" in the heapq test suite is a much
better use of heaps when N is known, accepting an iterable directly (without
turning it into a list first), and using storage for only N items.  When N
is (as is typical) much smaller than the total number of elements, that
method can beat the pants off list.sort() even with the Python
implementation of heaps.  Indeed, Guido and I used that method for
production code in Zope's full-text search subsystem (find the N best
matches to a search query over some 10-200K documents).

David presented a method that ran even faster, provided it was coded just
right, based on doing quicksort-like partitioning steps on a buffer of about
3*N values.  That also uses total space proportional to N (independent of
the total number of incoming elements).  A heap-based N-best queue would
probably beat that again now that heaps are implemented in C.  OTOH, if we
implemented a quicksort-like partitioning routine in C too ... (it also
suffers from gobs of fiddly little integer arithmetic and simple array
indexing, which screams in C).


From anthony at interlink.com.au  Sun Nov 16 03:07:44 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Nov 16 03:08:34 2003
Subject: [Python-Dev] sqlite into std library for 2.4?
Message-ID: <200311160807.hAG87ion025129@localhost.localdomain>


I'd like to suggest we include sqlite in the standard library for 2.4.

It's maintained, is a full-featured SQL database with a very small footprint
and very little needed in the way of dead chickens to get it up and running.

Anyone else?
Anthony

From skip at manatee.mojam.com  Sun Nov 16 08:01:10 2003
From: skip at manatee.mojam.com (Skip Montanaro)
Date: Sun Nov 16 08:01:18 2003
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200311161301.hAGD1AJi023783@manatee.mojam.com>


Bug/Patch Summary
-----------------

580 open / 4343 total bugs (+68)
196 open / 2455 total patches (+19)

New Bugs
--------

Typos in the docs (Extending/Embedding + Python/C API) (2003-11-09)
	http://python.org/sf/838938
Document that highly recursive data cannot be pickled (2003-11-09)
	http://python.org/sf/839075
attempt to access sys.argv when it doesn't exist (2003-11-10)
	http://python.org/sf/839151
interators broken for weak dicts (2003-11-10)
	http://python.org/sf/839159
SimpleHTTPServer reports wrong content-length for text files (2003-11-10)
	http://python.org/sf/839496
Bug in type's GC handling causes segfaults (2003-11-10)
	http://python.org/sf/839548
String formatting operator % badly documented (2003-11-10)
	http://python.org/sf/839585
Windows non-MS compiler doc updates (2003-11-10)
	http://python.org/sf/839709
MacPython installer: disk image does not mount from NFS (2003-11-11)
	http://python.org/sf/839865
Incorrect shared library build (2003-11-11)
	http://python.org/sf/840065
weakref callbacks and gc corrupt memory (2003-11-12)
	http://python.org/sf/840829
xmlrpclib chokes on Unicode keys (2003-11-13)
	http://python.org/sf/841757
-O breaks bundlebuilder --standalone (2003-11-13)
	http://python.org/sf/841800
PackMan database for panther misses devtools dep (2003-11-14)
	http://python.org/sf/842116
logging.shutdown() exception (2003-11-14)
	http://python.org/sf/842170
Digital Unix build fails to create ccpython.o (2003-11-14)
	http://python.org/sf/842171
optparser help formatting nit (2003-11-14)
	http://python.org/sf/842213
xmlrpclib and backward compatibility (2003-11-14)
	http://python.org/sf/842600
Windows mis-installs to network drive (2003-11-14)
	http://python.org/sf/842629

New Patches
-----------

Footnote on bug in Mailbox with Windows text-mode files (2003-11-09)
	http://python.org/sf/838910
Cross building python for mingw32 (2003-11-13)
	http://python.org/sf/841454
Differentiation between Builtins and extension classes (2003-11-13)
	http://python.org/sf/841461
One more patch for --enable-shared (2003-11-13)
	http://python.org/sf/841807
reflect the removal of mpz (2003-11-14)
	http://python.org/sf/842567
NameError in the example of sets module (2003-11-15)
	http://python.org/sf/842994
doc fixes builtin super and string.replace (2003-11-16)
	http://python.org/sf/843088

Closed Bugs
-----------


Closed Patches
--------------

imaplib : Add support for the THREAD command (2003-08-31)
	http://python.org/sf/798297
invalid use of setlocale (2003-09-11)
	http://python.org/sf/804543

From barry at python.org  Sun Nov 16 12:27:13 2003
From: barry at python.org (Barry Warsaw)
Date: Sun Nov 16 12:27:25 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib
	libfuncs.tex,1.151,1.152
In-Reply-To: <E1ALPaw-0003o1-00@sc8-pr-cvs1.sourceforge.net>
References: <E1ALPaw-0003o1-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <1069003633.990.106.camel@anthem>

On Sun, 2003-11-16 at 11:17, rhettinger@users.sourceforge.net wrote:
> Update of /cvsroot/python/python/dist/src/Doc/lib
> In directory sc8-pr-cvs1:/tmp/cvs-serv13946/Doc/lib
> 
> Modified Files:
> 	libfuncs.tex 
> Log Message:
> * Migrate set() and frozenset() from the sandbox.
> * Install the unittests, docs, newsitem, include file, and makefile update.
> * Exercise the new functions whereever sets.py was being used.
> 
> Includes the docs for libfuncs.tex.  Separate docs for the types are
> forthcoming.

Okay, I must have missed the discussion on these, but why are these so
important that they should be in builtins?

-Barry


From DavidA at ActiveState.com  Sun Nov 16 14:01:11 2003
From: DavidA at ActiveState.com (David Ascher)
Date: Sun Nov 16 13:41:28 2003
Subject: [Python-Dev] sqlite into std library for 2.4?
In-Reply-To: <200311160807.hAG87ion025129@localhost.localdomain>
References: <200311160807.hAG87ion025129@localhost.localdomain>
Message-ID: <3FB7C977.7090004@ActiveState.com>

Anthony Baxter wrote:

> I'd like to suggest we include sqlite in the standard library for 2.4.
> 
> It's maintained, is a full-featured SQL database with a very small footprint
> and very little needed in the way of dead chickens to get it up and running.

FYI, it will be part of PHP 5, IIRC.

--da


From eppstein at ics.uci.edu  Sun Nov 16 13:42:09 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Sun Nov 16 13:42:12 2003
Subject: [Python-Dev] Re: set() and frozenset()
References: <001401c3aba5$28a8a3c0$183ac797@oemcomputer>
Message-ID: <eppstein-5C010F.10420916112003@sea.gmane.org>

In article <001401c3aba5$28a8a3c0$183ac797@oemcomputer>,
 "Raymond Hettinger" <raymond.hettinger@verizon.net> wrote:

> The C implementation of sets is now ready for your experimentation and
> comment.  If fulfills most of Greg's proposal at:
> http://www.python.org/peps/pep-0218.html
...
> The differences from sets.py are:
> 
> . The containers are now named set() and frozenset().  The lowercase
> convention is used to make the names consistent with other builtins like
> list() and dict().  User feedback said that the name ImmutableSet was
> unwieldy (?sp), so frozenset() was chosen as a more succinct
> alternative.

I for one found it difficult to remember whether it was Immutable or 
Immutible.

> . There is no automatic conversion from the mutable type to the
> non-mutable type.  User feedback revealed that this was never needed.

More than never needed, I would find it confusing to put a set into a 
dictionary or whatever and then find that some other object has been put 
there in its place.

> Also, removing it simplified the code considerably.  The result is more
> straight-forward and a lot less magical.  David Eppstein provided code
> examples demonstrating that set() and frozenset() are just right for
> implementing common graph algorithms and NFA/DFAs.  

Well, I used Set and ImmutableSet, but yes, they're very useful.
Thanks to Raymond for adding the backward compatibility to Python 2.2 
needed for me to try this out.

FWIW, I wrote another one yesterday, using a set partition refinement 
technique for recognizing chordal graphs; the code is at 
http://www.ics.uci.edu/~eppstein/PADS/Chordal.py, with subroutines in 
LexBFS.py, PartitionRefinement.py, and Sequence.py. The same partition 
refinement technique shows up in other algorithms including DFA 
minimization and would be quite painful without sets.

Sets seems to me to be as fundamental a data structure as lists and 
dictionaries, and I'm enthusiastic about this becoming built in and 
faster. I would have liked to see {1,2,3} type syntax for sets, but the 
set/frozenset issue makes that a little problematic and perhaps the new 
iterator expressions make it unnecessary.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From allison at sumeru.stanford.EDU  Sun Nov 16 13:59:25 2003
From: allison at sumeru.stanford.EDU (Dennis Allison)
Date: Sun Nov 16 13:59:37 2003
Subject: [Python-Dev] sqlite into std library for 2.4?
In-Reply-To: <200311160807.hAG87ion025129@localhost.localdomain>
Message-ID: <Pine.LNX.4.10.10311161058460.516-100000@sumeru.stanford.EDU>

I've no used it  -- I just downloaded it to test -- but it looks like a 
very good candidate for inclusion.
	-d

On Sun, 16 Nov 2003, Anthony Baxter wrote:

> 
> I'd like to suggest we include sqlite in the standard library for 2.4.
> 
> It's maintained, is a full-featured SQL database with a very small footprint
> and very little needed in the way of dead chickens to get it up and running.
> 
> Anyone else?
> Anthony
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/allison%40sumeru.stanford.edu
> 


From gh at ghaering.de  Sun Nov 16 14:17:38 2003
From: gh at ghaering.de (=?ISO-8859-1?Q?Gerhard_H=E4ring?=)
Date: Sun Nov 16 14:17:44 2003
Subject: [Python-Dev] sqlite into std library for 2.4?
In-Reply-To: <200311160807.hAG87ion025129@localhost.localdomain>
References: <200311160807.hAG87ion025129@localhost.localdomain>
Message-ID: <3FB7CD52.1080200@ghaering.de>

Anthony Baxter wrote:
> I'd like to suggest we include sqlite in the standard library for 2.4.
> 
> It's maintained, is a full-featured SQL database with a very small footprint
> and very little needed in the way of dead chickens to get it up and running.

I'm the (currently only active) PySQLite maintainer, so I think I'm 
qualified to comment on this ;)

Before we can think about including this into the Python distribution 
there are two things I'd need to do:

- code cleanup and documentatino (inline documentation is quite sparse)

- writing documentation (the PySQLite documentation is quite outdated, 
and doesn't cover the advanced nonstandard features, like writing 
aggregates/functions in Python, etc.)

Inclusion in the Python standard library means an API freeze. I'm not 
sure all of PySQLite has the best interfaces, yet. One solution could be 
to only document the parts where we consider the API *stable*.

Last, but not least, I don't see the tremendous benefit of a simple 
embedded SQL database in the Python standard distribution. Sure, Windows 
users would have to download one thing less, but for Unix users nothing 
much will change, because we'd most probably still require an existing 
SQLite installation. And SQLite is nothing that you can expect being 
installed, anyway, like BSDdb is. So, more or less, Unix users will only 
save downloading PySQLite separately.

-- Gerhard


From tismer at tismer.com  Sun Nov 16 20:02:48 2003
From: tismer at tismer.com (Christian Tismer)
Date: Sun Nov 16 20:02:56 2003
Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
Message-ID: <3FB81E38.9000505@tismer.com>

Hi friends,

over the weekend, I hacked quite a lot on Stackless
with Python 2.2.3, in order to get rid of refcounting
problems with thread pickling.
It tuned out that code objects created wrong refcounts
when unpickling them.

I debuged this down to the very end, until I was sure
my stuff is doing it right. Then I added a small function
that recomputes the actual total refcounts from the
chained list of all objects, and it turned out to be
correct (and also my pickling), but _Py_RefTotal is different.

Before I invest more time into this, please let me know:
Is this a known problem which is solved by moving to
Python 2.3.2, or should I try to find the bug?
I know this is hard to debug for anybody but me, since pickling
of code objects is a Stackless only feature.

The key might be here:

void
_Py_NewReference(PyObject *op)
{
	_Py_RefTotal++;
	op->ob_refcnt = 1;
	op->_ob_next = refchain._ob_next;
	op->_ob_prev = &refchain;
	refchain._ob_next->_ob_prev = op;
	refchain._ob_next = op;
#ifdef COUNT_ALLOCS
	inc_count(op->ob_type);
#endif
}

It might be that at some place, this function is used when
the refcount is not zero, but I don't know. This would get
_Py_RefTotal and the real refcounts out of sync.

Many thanks -- chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From greg at electricrain.com  Sun Nov 16 21:05:21 2003
From: greg at electricrain.com (Gregory P. Smith)
Date: Sun Nov 16 21:05:27 2003
Subject: [Python-Dev] sqlite into std library for 2.4?
In-Reply-To: <3FB7CD52.1080200@ghaering.de>
References: <200311160807.hAG87ion025129@localhost.localdomain>
	<3FB7CD52.1080200@ghaering.de>
Message-ID: <20031117020521.GB3366@zot.electricrain.com>

On Sun, Nov 16, 2003 at 08:17:38PM +0100, Gerhard H?ring wrote:
>
> Inclusion in the Python standard library means an API freeze. I'm not 
> sure all of PySQLite has the best interfaces, yet. One solution could be 
> to only document the parts where we consider the API *stable*.
> 
> Last, but not least, I don't see the tremendous benefit of a simple 
> embedded SQL database in the Python standard distribution. Sure, Windows 
> users would have to download one thing less, but for Unix users nothing 
> much will change, because we'd most probably still require an existing 
> SQLite installation. And SQLite is nothing that you can expect being 
> installed, anyway, like BSDdb is. So, more or less, Unix users will only 
> save downloading PySQLite separately.

Agreed.  I love SQLite (though i've not yet used it with python) but
I don't think it needs to be bundled as part of the standard dist.
Its an easy add-on.

Perhaps it could just get a mention and a hyperlink in the python
documentation (where?) as a suggested embedded SQL database.

One thing that would change my mind about inclusion is if a python
library similar to 'SQLObject' or 'orm' were of in good enough shape to
be included at the same time.  Both provide an object oriented abstraction
to a database preventing you from needing to write any SQL in most cases;
similar to perl's Class::DBI package.

-g


From jeremy at zope.com  Sun Nov 16 21:47:12 2003
From: jeremy at zope.com (Jeremy Hylton)
Date: Sun Nov 16 21:50:31 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEDMHCAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCIEDMHCAB.tim@zope.com>
Message-ID: <1069037232.6983.1.camel@localhost.localdomain>

On Fri, 2003-11-14 at 23:17, Tim Peters wrote:
> Objections?

None here, but you knew that.  Everyone seems to be interested in this
topic, though.

How hard is the implementation going to be?

Jeremy


From tim at zope.com  Sun Nov 16 22:24:33 2003
From: tim at zope.com (Tim Peters)
Date: Sun Nov 16 22:24:33 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <1069037232.6983.1.camel@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEMAHCAB.tim@zope.com>

[Tim, sketches a Grand Scheme for making weakref callbacks from cyclic
 garbage wholly sane, then asks ..]
>> Objections?

[Jeremy Hylton]
> None here, but you knew that.

Great!

> Everyone seems to be interested in this topic, though.

Then I expect everyone to volunteer to test the patch <wink>.

> How hard is the implementation going to be?

I just made a patch while running my final tests, so I have a pretty solid
proof that what I sketched was implementable <wink>.  It's exactly the
scheme I described, and the coding went smoothly because it was something
that could be (and was) fully thought-out in advance.  That doesn't rule out
conceptual or coding errors, though.

Now I'll stop typing until I know whether all the tests pass ...

OK, here's the patch:

    http://www.python.org/sf/843455

I asked especially for Neal's (Mr. GC) and Fred's (Mr. WeakRef) reviews, but
all reviews are welcome.


From rohit.nadhani at tallysolutions.com  Mon Nov 17 04:22:05 2003
From: rohit.nadhani at tallysolutions.com (RN)
Date: Mon Nov 17 04:30:39 2003
Subject: [Python-Dev] Variable Scope
Message-ID: <bpa405$nqs$1@sea.gmane.org>

I have a 2 Python scripts that contain the following lines:

test.py
-------
from testmod import *
a1 = 10
modfunc()

testmod.py
-----------
def modfunc():
    print a1

When I run test.py, it returns the following error:

  File "testmod.py", line 2, in modfunc
    print a1
NameError: global name 'a1' is not defined

My intent is to make a1 a global variable - so that I can access its value
in all functions of imported modules. What should I do?

Thanks in advance,

Rohit


From martin at v.loewis.de  Mon Nov 17 05:15:29 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Mon Nov 17 05:17:15 2003
Subject: [Python-Dev] Variable Scope
In-Reply-To: <bpa405$nqs$1@sea.gmane.org>
References: <bpa405$nqs$1@sea.gmane.org>
Message-ID: <m3brrbgv26.fsf@mira.informatik.hu-berlin.de>

"RN" <rohit.nadhani@tallysolutions.com> writes:

> My intent is to make a1 a global variable - so that I can access its value
> in all functions of imported modules. What should I do?

Please post the question to python-list@python.org. python-dev is for
the development *of* Python, not for the development *with* Python.

Regards,
Martin


From arigo at tunes.org  Mon Nov 17 06:03:08 2003
From: arigo at tunes.org (Armin Rigo)
Date: Mon Nov 17 06:07:28 2003
Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <3FB81E38.9000505@tismer.com>
References: <3FB81E38.9000505@tismer.com>
Message-ID: <20031117110308.GA31680@vicky.ecs.soton.ac.uk>

Hello Christian,

On Mon, Nov 17, 2003 at 02:02:48AM +0100, Christian Tismer wrote:
> I debuged this down to the very end, until I was sure
> my stuff is doing it right. Then I added a small function
> that recomputes the actual total refcounts from the
> chained list of all objects, and it turned out to be
> correct (and also my pickling), but _Py_RefTotal is different.

I found a few places that manipulate ob_refcnt directly without worrying about 
keeping _Py_RefTotal or other debugging information in sync:

* classobject.c:instance_dealloc(), for __del__

* stringobject.c, for interned strings

* typeobject.c:slot_tp_del(), for __del__ too

I bet you could also find these easily, but maybe it should be regarded as a 
bug list.  At any rate, the __del__ tricks will indeed make some counters 
invalid.


A bientot,

Armin.


From mwh at python.net  Mon Nov 17 07:24:25 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 17 07:24:35 2003
Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <20031117110308.GA31680@vicky.ecs.soton.ac.uk> (Armin Rigo's
	message of "Mon, 17 Nov 2003 11:03:08 +0000")
References: <3FB81E38.9000505@tismer.com>
	<20031117110308.GA31680@vicky.ecs.soton.ac.uk>
Message-ID: <2my8ufp4hy.fsf@starship.python.net>

Armin Rigo <arigo@tunes.org> writes:

> Hello Christian,
>
> On Mon, Nov 17, 2003 at 02:02:48AM +0100, Christian Tismer wrote:
>> I debuged this down to the very end, until I was sure
>> my stuff is doing it right. Then I added a small function
>> that recomputes the actual total refcounts from the
>> chained list of all objects, and it turned out to be
>> correct (and also my pickling), but _Py_RefTotal is different.
>
> I found a few places that manipulate ob_refcnt directly without
> worrying about keeping _Py_RefTotal or other debugging information
> in sync:

Um, don't most of these places at least *try* to keep _Py_RefTotal in
sync?  I am aware of a few places that get this wrong, but the fixes
weren't obvious to me.

> * classobject.c:instance_dealloc(), for __del__

One way of getting _Py_RefTotal out of sync is resurrecting objects in
__del__ methods.  Another is some bizarre interaction with the
trashcan machinery (don't recall what, sorry, may also be different
with 2.2 vs 2.3).

> * stringobject.c, for interned strings
>
> * typeobject.c:slot_tp_del(), for __del__ too
>
> I bet you could also find these easily, but maybe it should be regarded as a 
> bug list.  

I think these are bugs.

> At any rate, the __del__ tricks will indeed make some counters
> invalid.

Which __del__ tricks specifically?

Cheers,
mwh

-- 
  Strangely enough  I saw just such a beast at  the grocery store
  last night. Starbucks sells Javachip. (It's ice cream, but that
  shouldn't be an obstacle for the Java marketing people.)
                                         -- Jeremy Hylton, 29 Apr 1997

From Jack.Jansen at cwi.nl  Mon Nov 17 10:18:59 2003
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Mon Nov 17 10:18:57 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
	modulefinder.py, 1.7, 1.8
In-Reply-To: <200311141815.hAEIF2j05017@12-236-54-216.client.attbi.com>
References: <E1AKbC0-0004WS-00@sc8-pr-cvs1.sourceforge.net>
	<200311141603.hAEG3do04761@12-236-54-216.client.attbi.com>
	<ad6yubrb.fsf@python.net>
	<16309.5562.644105.6880@montanaro.dyndns.org>
	<200311141815.hAEIF2j05017@12-236-54-216.client.attbi.com>
Message-ID: <5E99A2D2-1911-11D8-80BE-0030655234CE@cwi.nl>


On 14 Nov 2003, at 19:15, Guido van Rossum wrote:

>> Maybe freeze should be deprecated in 2.4.
>
> That might be a good idea.
>
>> There are other third-party packages (Gordon McMillan's installer
>> and Thomas's py2exe) which do a better job anyway.  Does either one
>> use freeze under the covers?
>
> No.  (Though py2exe uses modulefinder, which is why that's in Lib
> rather than in Tools/freeze. :-)

And so do the freeze tools on the Mac (BuildApplication and I think also
bundlebuilder).
--
Jack Jansen        <Jack.Jansen@cwi.nl>        http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman


From python at rcn.com  Mon Nov 17 10:31:15 2003
From: python at rcn.com (Raymond Hettinger)
Date: Mon Nov 17 10:31:43 2003
Subject: [Python-Dev] Small bug -- direct check-in allowed? 
In-Reply-To: <200311162352.hAGNqTYA002118@localhost.localdomain>
Message-ID: <003201c3ad1f$d750e420$e841fea9@oemcomputer>

> >>> "Raymond Hettinger" wrote
> > I think that adds an unnecessary level of indirection.  SF helps
when it
> > comes to tracking, public discussion, patch evolution, the approval
> > process, etc.  However, for direct fixes, I think the check-in
message
> > is sufficient.
> 
> I disagree - if I hit a bug and want to see if it's fixed, often the
> entry in Misc/NEWS is far too brief to be useful. Not everyone has a
> CVS checkout of Python that they can check against.

For big bugs, having a SF entry or detailed news entry is reasonable.
But for buglets, there is a PITA factor that goes with opening an SF
report, fixing the bug, referencing it the SF in checking, referencing
the checking in SF, and immediately closing the report.  That PITA
factor is cost that will be paid by every active developer and, IMO,
give very little gain.  Beyond cluttering the bugs list, it can become
an obstacle to getting the bugs fixed at all.   I am certain that adding
more administrative overhead will make it less likely that someone will
bother with an otherwise quick fix.  That isn't just laziness, the
volunteers often only have a minutes to deal will something they happen
to see.  Also, volunteers don't want to feel like their time is being
wasted.  I, for one, would loath having to go back through all of my
checkins and create/edit/reference/close a related SF report.  It would
be a boring day long project that would suck and yet add almost nothing.

I'm sure there are a few who value like having all those references but
I'm unwilling to transfer that burden onto the tiny group of people who
volunteer their time fixing little buglets everywhere.  If there is
someone who places value on the references and also has checkin
priviledges, then there is nothing stopping them from reading each
checkin and establishing a new SF entry for it.  I think they would be
wasting their time, but if that is *their* itch, then they are welcome
to scratch it.  Making me scratch their itch is another matter entirely.


Raymond


From tim at zope.com  Mon Nov 17 11:06:57 2003
From: tim at zope.com (Tim Peters)
Date: Mon Nov 17 11:07:26 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEMAHCAB.tim@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOHHCAB.tim@zope.com>

[Tim, on <http://www.python.org/sf/843455>]
> ...
> It's exactly the scheme I described, and the coding went smoothly
> because it was something that could be (and was) fully thought-out in
> advance.  That doesn't rule out conceptual or coding errors, though.

As I noted on the patch in the wee hours, "conceptual errors" wins.  I
out-thought a wrong thing, but one that happened to be good enough to fix
all the new test cases:  it doesn't really matter which objects are
reachable from the objects whose deaths trigger callbacks, what really
matters is which objects are reachable from the callbacks themselves.  The
test cases were so incestuous (objects all pointing to each other) that
those turned out to be the same sets, but that's not a necessary outcome --
although it appears to be a likely outcome.

Here's one that's surprising after the patch:

"""
import weakref, gc

class C:
    def cb(self, ignore):
        print self.__dict__

c1, c2 = C(), C()

c2.me = c2
c2.c1 = c1
c2.wr = weakref.ref(c1, c2.cb)

del c1, c2
print 'about to collect'
gc.collect()
print 'collected'
"""

The callback triggers on the death of c1 then, but c1 isn't in a cycle at
all (it's hanging *off* a cycle), and c2 isn't reachable from c1.  But c2 is
reachable from the callback.

c2 is in a self-cycle via c2.me, and in another via c2.wr (which indirectly
points back to c2 via the weakref's bound method object c2.cb).

After the patch, c1 ends up in the set of objects with an associated weakref
callback, but c2 isn't reachable from that set so tp_clear is called on c2.
That destroys c2's __dict__ before the callback can get invoked, so when c1
dies the callback sees a tp_clear'ed c2:

    about to collect
    {}
    collected

I know it's hard for people to get excited about an empty dict <wink>.  But
that's not the point:  the point is that if it's possible to expose an
object that's been tp_clear'ed to Python code, then *anything* can happen.
For example, this minor variation segfaults after the patch, right after
printing "about to collect":

"""
import weakref, gc

class C(object):
    def cb(self, ignore):
        print self.__dict__

class D:
    pass

c1, c2 = D(), C()

c2.me = c2
c2.c1 = c1
c2.wr = weakref.ref(c1, c2.cb)

del c1, c2, C, D
print 'about to collect'
gc.collect()
print 'collected'
"""

That class C was reachable from c1 in the first example protected C from
getting tp_clear'ed at all, which was something the patch was trying to
accomplish.  But by giving c1 a different class, C's tp_clear immunity went
away, but C is still reachable from the callback.  Boom.


So what's reachable from a callback?  If the callback is not *itself* part
of the garbage getting collected, then it acts like an external root, and so
nothing reachable from the callback is part of the garbage getting collected
either.  gc has no worries then.

If the callback itself is part of the garbage getting collected, then the
weakref holding the callback must also be part of the garbage getting
collected (else the weakref holding the callback would act as an external
root, preventing the callback from being part of the garbage being collected
too).

My thought then was that a simpler scheme could simply call tp_clear on the
trash weakrefs first.  Calling tp_clear on a weakref just throws away the
associated callbacks (if any) unexecuted, and if they don't get run then we
have no reason to care what's reachable from them anymore.

The fly in that ointment appears to be that a callback can itself be the
target of a weakref, so that when the callback is thrown away, it can
trigger calling another callback.  At that point I feel asleep muttering
unspeakable oaths.


From tismer at tismer.com  Mon Nov 17 11:18:26 2003
From: tismer at tismer.com (Christian Tismer)
Date: Mon Nov 17 11:17:48 2003
Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <20031117110308.GA31680@vicky.ecs.soton.ac.uk>
References: <3FB81E38.9000505@tismer.com>
	<20031117110308.GA31680@vicky.ecs.soton.ac.uk>
Message-ID: <3FB8F4D2.8030301@tismer.com>

Armin Rigo wrote:

...

> I found a few places that manipulate ob_refcnt directly without worrying about 
> keeping _Py_RefTotal or other debugging information in sync:
> 
> * classobject.c:instance_dealloc(), for __del__
> 
> * stringobject.c, for interned strings
> 
> * typeobject.c:slot_tp_del(), for __del__ too
> 
> I bet you could also find these easily, but maybe it should be regarded as a 
> bug list.  At any rate, the __del__ tricks will indeed make some counters 
> invalid.

Many thanks, this was very helpful.
I will probably fix those cases which affect my
code and submit patches. I was just not sure
whether this is a known problem, maybe already solved.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tim.one at comcast.net  Mon Nov 17 11:36:44 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Nov 17 11:36:40 2003
Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <20031117110308.GA31680@vicky.ecs.soton.ac.uk>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEOKHCAB.tim.one@comcast.net>

[Armin]
> I found a few places that manipulate ob_refcnt directly without
> worrying about keeping _Py_RefTotal or other debugging information
> in sync:

In which codebase?  (2.3.2, 2.3 maint, 2.4, ...?)

> * classobject.c:instance_dealloc(), for __del__

instance_dealloc endures outrageous convolution trying to keep _Py_RefTotal
(and friends) correct, although the code is very different between 2.3.2 and
2.2.3.  In 2.3.2:

- If the instance isn't resurrected, then the instance goes away without
  fiddling _Py_RefTotal at all.  That's correct.

- If it is resurrected, then

		/* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal,
		 * but _Py_NewReference bumped it again, so that's a wash.
		 * If Py_TRACE_REFS, _Py_NewReference re-added self to the
		 * object chain, so no more to do there either.
		 * If COUNT_ALLOCS, the original decref bumped tp_frees, and
		 * _Py_NewReference bumped tp_allocs:  both of those need to
		 * be undone.
		 */

By "the original decref" it means the Py_DECREF that caused the instance's
refcount to fall to 0 in the first place (thus getting us into
instance_dealloc).

> * stringobject.c, for interned strings

Easy to believe that one's screwed up <wink>.

> * typeobject.c:slot_tp_del(), for __del__ too

At least in 2.3.2, that's enduring the same convolutions as instance_dealloc
trying to keep this stuff right.


From tim at zope.com  Mon Nov 17 12:12:16 2003
From: tim at zope.com (Tim Peters)
Date: Mon Nov 17 12:12:58 2003
Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <2my8ufp4hy.fsf@starship.python.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEOPHCAB.tim@zope.com>

[Michael Hudson]
> ...
> One way of getting _Py_RefTotal out of sync is resurrecting objects in
> __del__ methods.

Oops!  That's right:

"""
from sys import gettotalrefcount as g

class C:
    def __del__(self):
        alist.append(self)

alist = []
c1, c2, c3 = C(), C(), C()
del c1, c2, c3

while 1:
    print g(), len(alist),
    del alist[:]
"""

g() goes up by 3 each time around the loop.

		/* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal,
		 * but _Py_NewReference bumped it again, so that's a wash.

Heh.  If you ignore the new reference(s) that resurrected the thing, I
suppose that would be true.  It should (2.3.2) do

	_Py_DEC_REFTOTAL;

to make up for the extra increment done by _Py_NewReference; likewise in
slot_tp_del (BTW, the macro expands to nothing if Py_REF_DEBUG isn't
defined).


From mwh at python.net  Mon Nov 17 12:40:27 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 17 12:40:35 2003
Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEOPHCAB.tim@zope.com> (Tim Peters's
	message of "Mon, 17 Nov 2003 12:12:16 -0500")
References: <LNBBLJKPBEHFEDALKOLCGEOPHCAB.tim@zope.com>
Message-ID: <2my8uenbas.fsf@starship.python.net>

"Tim Peters" <tim@zope.com> writes:

> [Michael Hudson]
>> ...
>> One way of getting _Py_RefTotal out of sync is resurrecting objects in
>> __del__ methods.
>
> Oops!  That's right:

[snip evidence]

This is also why running test_descr in a loop still bumps
sys.gettotalrefcount() by 3 or so each time.

> 		/* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal,
> 		 * but _Py_NewReference bumped it again, so that's a wash.
>
> Heh.  If you ignore the new reference(s) that resurrected the thing, I
> suppose that would be true.  It should (2.3.2) do
>
> 	_Py_DEC_REFTOTAL;
>
> to make up for the extra increment done by _Py_NewReference; likewise in
> slot_tp_del (BTW, the macro expands to nothing if Py_REF_DEBUG isn't
> defined).

Is it that easy?  I remember fooling a little with this, but not
successfully.  It's just possible <wink> that I got confused, though.
(Confused by finalizer issues?  How could that be?)

FWIW, my foolings were with new-style objects -- but from what you say
in another post, it's unsurprising to find isomorphic problems with
old-style classes (as in your example).

Cheers,
mwh

-- 
  Java is a WORA language! (Write Once, Run Away)
                	-- James Vandenberg (on progstone@egroups.com)
                           & quoted by David Rush on comp.lang.scheme

From nas-python at python.ca  Mon Nov 17 12:54:57 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Mon Nov 17 12:52:45 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOHHCAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCAEMAHCAB.tim@zope.com>
	<LNBBLJKPBEHFEDALKOLCOEOHHCAB.tim@zope.com>
Message-ID: <20031117175456.GA22498@mems-exchange.org>

On Mon, Nov 17, 2003 at 11:06:57AM -0500, Tim Peters wrote:
> it doesn't really matter which objects are reachable from the
> objects whose deaths trigger callbacks, what really matters is
> which objects are reachable from the callbacks themselves.

Right, it's all about what that nasty user code can do. :-)

> So what's reachable from a callback?  If the callback is not *itself* part
> of the garbage getting collected, then it acts like an external root, and so
> nothing reachable from the callback is part of the garbage getting collected
> either.  gc has no worries then.

Okay.

> If the callback itself is part of the garbage getting collected, then the
> weakref holding the callback must also be part of the garbage getting
> collected (else the weakref holding the callback would act as an external
> root, preventing the callback from being part of the garbage being collected
> too).
> 
> My thought then was that a simpler scheme could simply call tp_clear on the
> trash weakrefs first.  Calling tp_clear on a weakref just throws away the
> associated callbacks (if any) unexecuted, and if they don't get run then we
> have no reason to care what's reachable from them anymore.

This I don't get.  Don't people want the callbacks to be called?

I don't see how a weakref callback is different than a __del__
method.  While the object is not always reachable from the callback
it could be (e.g. the callback could be a method).  The fact that
callbacks are one shot doesn't seem to help either since the
callback can create a new callback.

  Neil

From tim.one at comcast.net  Mon Nov 17 13:02:13 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Nov 17 13:02:07 2003
Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3
In-Reply-To: <2my8uenbas.fsf@starship.python.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPEHCAB.tim.one@comcast.net>

[Michael Hudson]
> This is also why running test_descr in a loop still bumps
> sys.gettotalrefcount() by 3 or so each time.

Ah, so it's critical then <wink -- but, really, it should be fixed>.

[Tim]
>> 		/* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal,
>> 		 * but _Py_NewReference bumped it again, so that's a wash.
>>
>> Heh.  If you ignore the new reference(s) that resurrected the thing,
>> I suppose that would be true.  It should (2.3.2) do
>>
>> 	_Py_DEC_REFTOTAL;
>>
>> to make up for the extra increment done by _Py_NewReference;
>> likewise in slot_tp_del (BTW, the macro expands to nothing if
>> Py_REF_DEBUG isn't defined).


> Is it that easy?

In 2.3.2, it should be.  The code is more convoluted in 2.2.3.  I don't care
about 2.2.n anymore, though, so I'm not going to spend any time looking at
that.

> I remember fooling a little with this, but not successfully.  It's just
> possible <wink> that I got confused, though.  (Confused by finalizer
> issues?  How could that be?)

I hate finalizers.  I'm learning to hate weakref callbacks too.

> FWIW, my foolings were with new-style objects -- but from what you say
> in another post, it's unsurprising to find isomorphic problems with
> old-style classes (as in your example).

Right, Guido did copy+paste of masses of old-style object code into the
new-style object code.  One or two new bugs were introduced that way that I
know of, long since fixed.  This one is a case of duplicating a bug, and it
looks to be as shallow as they get.  Whoever did the last rework of string
interning clearly wasn't thinking about all these "special builds" at all,
so that may be trickier.


From tim at zope.com  Mon Nov 17 14:20:30 2003
From: tim at zope.com (Tim Peters)
Date: Mon Nov 17 14:21:44 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <20031117175456.GA22498@mems-exchange.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>

[Tim]
>> If the callback itself is part of the garbage getting collected,
>> then the weakref holding the callback must also be part of the
>> garbage getting collected (else the weakref holding the callback
>> would act as an external root, preventing the callback from being
>> part of the garbage being collected too).
>>
>> My thought then was that a simpler scheme could simply call tp_clear
>> on the trash weakrefs first.  Calling tp_clear on a weakref just
>> throws away the associated callbacks (if any) unexecuted, and if
>> they don't get run then we have no reason to care what's reachable
>> from them anymore.

[Neil Schemenauer]
> This I don't get.  Don't people want the callbacks to be called?

The one person I know who cares about this a lot is Jim, and he was happy to
have his callbacks raise mystery exceptions, just not segfaults <wink>.  But
if he doesn't care whether his callbacks "do something" in this context, he
can't care whether they don't get run at all in this context either.

When a weakref goes away, its callback (if any) goes away too, unexecuted,
cyclic gc or not.  If the weakref is part of cyclic trash, then clearing it
up first is defensible -- that may have happened in 2.3.2 already, as the
order in which gc invokes tp_clear is mostly accidental.  If I can force the
order in such a way as to reliably prevent disasters, that's a good
tradeoff.  If the user doesn't want the possibility for weakref callbacks
not to get invoked, then they have to ensure that the weakref itself
outlives the object whose death triggers that weakref's callback.  They have
to do that today too, with or without cyclic gc:

>>> def cb(ignore): return 1/0
...
>>> import weakref
>>> class C: pass
...
>>> c = C()
>>> wr = weakref.ref(c, cb)
>>> del wr
>>> del c
>>>

Once the weakref is cleared, the callback is history.  When a weakref is
part of a trash cycle, may as well clear it first.

> I don't see how a weakref callback is different than a __del__
> method.  While the object is not always reachable from the callback
> it could be (e.g. the callback could be a method).  The fact that
> callbacks are one shot doesn't seem to help either since the
> callback can create a new callback.

It's the one-shot business that (I think) makes them easier to live with, in
conjunction with that a callback vanishes if the weakref holding it goes
away.  A __del__ method never goes away.  While a callback *can* install new
callbacks, all over the place, I don't expect that real code does that.  For
code that doesn't, gc can make good progress.

Java's flavor of __del__ method executes at most once:  if an object is
resurrected by its finalizer, that object's finalizer will never be run
again (unless invoked explicitly by the user).  That allows Java's gc to
make good progress in the presence of resurrecting finalizers too:
finalizers (if any) in cycles are run in an arbitrary order, and if any were
run gc has to give up on finishing tearing down the objects (it can't know
whether finalizers have resurrected objects until gc runs again).  In the
absence of resurrection, though, the next time gc runs, all the objects it
ran finalizers on before are almost certainly still trash, and it can
reclaim the memory without running dangerous finalizers again first.  The
patch I posted for weakrefs took a similar approach.

Java doesn't allow adding callbacks to its elaborate weakrefs, though.  It's
more like the way we treat gc.garbage:  you can optionally specify a
ReferenceQueue object with a Java weakref, and when the referenced object is
dead the weakref is added to the queue, for user inspection (well, I guess
it's a little different for Java's "phantom references", but who cares ...).

So I've been moving to a scheme where we treat finalizers like Java treats
weakrefs, and we treat weakref callbacks like Java treats finalizers <wink>.

The Java weakref facilities would be a lot easier for gc to live with, but
too late for that.

Jim empathically doesn't want to poll gc.garbage looking for weakrefs that
appear in cycles.  Maybe "tough luck" is the best response we can come up
with to that, but cycles are getting very easy to create in Python by
accident, so I don't really want to settle for that.  OTOH, people can write
__del__ methods that don't provoke leaks, and I suspect they could learn how
to write weakrefs that don't provoke leaks too (assuming we changed Python
to treat "has a weakref callback" the same as "has a __del__ method").  One
way to do that was mentioned above, ensuring that a weakref outlives the
object whose death triggers the weakref's callback.  Or ensuring the
reverse.  It's only letting them die "at the same time" in a trash cycle
that creates trouble.

If the weakref and that object are both in the same clump of cyclic trash,
it's unpredictable what happens in 2.3.2.  If the weakref suffers tp_clear()
first, the callback won't get invoked; if the object suffers tp_clear()
first, the callback will get invoked -- but may lead to segfaults or lesser
surprises.

We can certainly repair that by treating objects with callbacks the same as
objects with __del__ methods when they're in cyclic trash, and that's an
easy change to the implementation.  Then the objects with callbacks, and
everything reachable from them, leak unless/until the user snaps enough
cycles in gc.garbage.

I don't have a feel for how much trouble it would be to avoid running afoul
of that.  Jim has so far presented it as an unacceptable burden.

Another scheme is to just run all the weakref callbacks associated with
trash cycles, without tp_clear'ing anything first.  Then run gc again to
figure out what's still trash, and repeat until no more weakref callbacks in
trash cycles exist.  If the weakref implementation is changed to forbid
creating a new weakref callback while a weakref callback is executing, that
gc-loop must eventually terminate (after the first try even in most code
that does manage to put weakref callbacks in trash cycles).

Beats me ...


From mwh at python.net  Mon Nov 17 15:11:24 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 17 15:12:13 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com> (Tim Peters's
	message of "Mon, 17 Nov 2003 14:20:30 -0500")
References: <LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>
Message-ID: <2mu152n4b7.fsf@starship.python.net>

"Tim Peters" <tim@zope.com> writes:

> Another scheme is to just run all the weakref callbacks associated with
> trash cycles, without tp_clear'ing anything first.  Then run gc again to
> figure out what's still trash, and repeat until no more weakref callbacks in
> trash cycles exist.  If the weakref implementation is changed to forbid
> creating a new weakref callback while a weakref callback is executing, that
> gc-loop must eventually terminate (after the first try even in most code
> that does manage to put weakref callbacks in trash cycles).

Maybe I'm misunderstanding, but in the presence of threads might that
not create much confusion?  I'm envisaging

1) object reaches refcount 0
2) weakred callback gets called
3) thread switch happens
4) new thread attempts to create a weakref callback, which fails
5) programmer goes insane

Or am I missing something?

Cheers,
mwh

-- 
  There's an aura of unholy black magic about CLISP.  It works, but
  I have no idea how it does it.  I suspect there's a goat involved
  somewhere.                     -- Johann Hibschman, comp.lang.scheme

From tim.one at comcast.net  Mon Nov 17 15:29:18 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Nov 17 15:29:12 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <2mu152n4b7.fsf@starship.python.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAIHDAB.tim.one@comcast.net>

[Tim]
>> Another scheme is to just run all the weakref callbacks associated
>> with trash cycles, without tp_clear'ing anything first.  Then run gc
>> again to figure out what's still trash, and repeat until no more
>> weakref callbacks in trash cycles exist.  If the weakref
>> implementation is changed to forbid creating a new weakref callback
>> while a weakref callback is executing, that gc-loop must eventually
>> terminate (after the first try even in most code that does manage to
>> put weakref callbacks in trash cycles). 

[Michael Hudson]
> Maybe I'm misunderstanding, but in the presence of threads might that
> not create much confusion?  I'm envisaging
>
> 1) object reaches refcount 0
> 2) weakred callback gets called
> 3) thread switch happens
> 4) new thread attempts to create a weakref callback, which fails
> 5) programmer goes insane
>
> Or am I missing something?

Nope -- it's a downside to that scheme, probably fatal.

From jim at zope.com  Mon Nov 17 15:33:19 2003
From: jim at zope.com (Jim Fulton)
Date: Mon Nov 17 15:34:35 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>
Message-ID: <3FB9308F.7010305@zope.com>

Tim Peters wrote:
> [Tim]
> 
>>>If the callback itself is part of the garbage getting collected,
>>>then the weakref holding the callback must also be part of the
>>>garbage getting collected (else the weakref holding the callback
>>>would act as an external root, preventing the callback from being
>>>part of the garbage being collected too).
>>>
>>>My thought then was that a simpler scheme could simply call tp_clear
>>>on the trash weakrefs first.  Calling tp_clear on a weakref just
>>>throws away the associated callbacks (if any) unexecuted, and if
>>>they don't get run then we have no reason to care what's reachable
>>>from them anymore.
> 
> 
> [Neil Schemenauer]
> 
>>This I don't get.  Don't people want the callbacks to be called?

As Tim pointed out, not if the weakref object dies before the object it
references.  I agree with Tim that if both the weakref and the object
it references are in a cycle, it makes sense to remove the weakrefs first.

...

> We can certainly repair that by treating objects with callbacks the same as
> objects with __del__ methods when they're in cyclic trash, and that's an
> easy change to the implementation.  Then the objects with callbacks, and
> everything reachable from them, leak unless/until the user snaps enough
> cycles in gc.garbage.

I think this would be really really bad.

> I don't have a feel for how much trouble it would be to avoid running afoul
> of that.  Jim has so far presented it as an unacceptable burden.

There's a big difference between __del__ and weakref callbacks.
The __del__ method is "internal" to a design. When you design a
class with a del method, you know you have to avoid including the
class in cycles.

Now, suppose you have a design that makes has no __del__ methods but that
does use cyclic data structures. You reason about the design, run tests,
and convince yourself you don't have a leak.

Now, suppose some external code creates a weak ref to one of your objects.
All of a sudden, you start leaking.  You can look at your code all
you want and you won't find a reason for the leak.

To protext yourself against this, you'd need a way of preventing wekrefs to
your class instances.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From nas-python at python.ca  Mon Nov 17 16:46:45 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Mon Nov 17 16:44:35 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>
References: <20031117175456.GA22498@mems-exchange.org>
	<LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>
Message-ID: <20031117214645.GA23186@mems-exchange.org>

On Mon, Nov 17, 2003 at 02:20:30PM -0500, Tim Peters wrote:
> When a weakref goes away, its callback (if any) goes away too,
> unexecuted, cyclic gc or not.

I did not know that.

> It's the one-shot business that (I think) makes them easier to
> live with, in conjunction with that a callback vanishes if the
> weakref holding it goes away.  A __del__ method never goes away.
> While a callback *can* install new callbacks, all over the place,
> I don't expect that real code does that.  For code that doesn't,
> gc can make good progress.

That sounds pragmatic and Pythonic.

> Jim empathically doesn't want to poll gc.garbage looking for
> weakrefs that appear in cycles.  Maybe "tough luck" is the best
> response we can come up with to that, but cycles are getting very
> easy to create in Python by accident, so I don't really want to
> settle for that.

Agreed.  It sucks to have to make things a lot more inconvenient
just because it's theoretically possible for people to make the
system behave badly.

> Another scheme is to just run all the weakref callbacks associated
> with trash cycles, without tp_clear'ing anything first.  Then run
> gc again to figure out what's still trash, and repeat until no
> more weakref callbacks in trash cycles exist.

Repeatedly running the GC sounds like trouble to me.  I think it
would be better to move everything reachable from them into the
youngest generation, finish the GC pass and then run them.  I
haven't been thinking about this as hard as you have though, so
perhaps I'm missing some subtlety.

I have to wonder if anyone would care if __del__ methods were
one-shot as well.  As a user, I would rather have one-shot __del__
methods and not have to deal with gc.garbage.  It would be nice if
we could treat both kinds of finalizers consistently.  Unfortunately
I can't think of a way of noting that the __del__ method was already
run.

I suppose if __del__ method continued to work the way they do,
people could just use weakref callbacks to do finalization.

  Neil

From fred at zope.com  Mon Nov 17 16:58:01 2003
From: fred at zope.com (Fred L. Drake, Jr.)
Date: Mon Nov 17 16:58:24 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <20031117214645.GA23186@mems-exchange.org>
References: <20031117175456.GA22498@mems-exchange.org>
	<LNBBLJKPBEHFEDALKOLCEEPOHCAB.tim@zope.com>
	<20031117214645.GA23186@mems-exchange.org>
Message-ID: <16313.17513.84542.827027@grendel.zope.com>


Neil Schemenauer writes:
 > I did not know that.

The callback is intended to be a notification that the referenced
object has gone away for anyone who's still interested.  To "lose
interest", you can just throw away you're reference.

 > I suppose if __del__ method continued to work the way they do,
 > people could just use weakref callbacks to do finalization.

Sigh.  So then everyone would wonder why the destructor registration
is done through the weakref module.  And constructors would assign a
weakref with a callback to an attribute on self.

Sounds nasty.


  -Fred

-- 
Fred L. Drake, Jr.  <fred at zope.com>
PythonLabs at Zope Corporation

From tim at zope.com  Mon Nov 17 16:59:25 2003
From: tim at zope.com (Tim Peters)
Date: Mon Nov 17 17:00:22 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <3FB9308F.7010305@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEBCHDAB.tim@zope.com>

[Jim Fulton]
> ...
> There's a big difference between __del__ and weakref callbacks.
> The __del__ method is "internal" to a design. When you design a
> class with a del method, you know you have to avoid including the
> class in cycles.
>
> Now, suppose you have a design that makes has no __del__ methods but
> that does use cyclic data structures. You reason about the design,
> run tests, and convince yourself you don't have a leak.
>
> Now, suppose some external code creates a weak ref to one of your
> objects. All of a sudden, you start leaking.  You can look at your
> code all you want and you won't find a reason for the leak.

I think that's an excellent argument -- thanks.

> To protext yourself against this, you'd need a way of preventing
> wekrefs to your class instances.

Not just to them, but also to anything in a cycle with one of your class
instances.  This may include the class itself, or instance bound method
objects I got hold of as "a callable" from somewhere else, and where I had
no idea that your class is involved.  It becomes intractable then for both
the class designer and the weakref user.


The patch I posted seemed correct for the problem it was solving.
Unfortunately, that wasn't the real problem <wink>.  However, instead of
identifying the transitive closure of objects reachable from trash objects
with a weakref callback, it could compute the transitive closure of

     objects reachable from (all) the callbacks
         associated with trash objects having a (at least one)
             weakref callback

Don't call tp_clear on those objects, and everything callbacks see will be
wholly intact.  Apart from a pile of new hair to compute that complicated
set instead, the rest of the patch is probably fine.

The other plausible idea is fixing the glitch with the simpler-at-first "do
tp_clear on trash weakref objects first" idea.  The problem with that is
that doing tp_clear on a weakref (or proxy) object ends up decref'ing the
callback, and the callback may *itself* have a weak reference to it, so that
decref'ing the callback triggers a different callback, and again arbitrary
Python code starts running in the middle of gc.


From tismer at tismer.com  Mon Nov 17 17:46:47 2003
From: tismer at tismer.com (Christian Tismer)
Date: Mon Nov 17 17:46:10 2003
Subject: [Python-Dev] more on pickling
Message-ID: <3FB94FD7.1030508@tismer.com>

Hi again,

trying to pickle bound python methods,
I'm now running into another problem.
It seems to give a problem when asking
for an attribute of a bound method:

 >>> class a:
...  def x(self): pass
 >>> a.x                   # good so far
<unbound method a.x>
 >>> a().x                 # very good
<bound method a.x of <__main__.a instance at 0x007A6CE8>>
 >>> a.x.__reduce__        # naaaaah? Sounds bad
<built-in method __reduce__ of function object at 0x00792630>
 >>> a().x.__reduce__      # very bad.
<built-in method __reduce__ of function object at 0x00792630>
 >>> a.x.__reduce__==a().x.__reduce__
1

So I have the impression these methods loose their
relationship to their originating object.
Is this behavior by intent, i.e. is it impossible to write
a working __reduce__ method for a bound class method?

thanks again - chris
-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tim at zope.com  Mon Nov 17 17:57:06 2003
From: tim at zope.com (Tim Peters)
Date: Mon Nov 17 17:57:27 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <20031117214645.GA23186@mems-exchange.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEBPHDAB.tim@zope.com>

[Tim]
>> ... but cycles are getting very easy to create in Python by accident,
>> so I don't really want to settle for that [push cyclic trash with
>> weakref callbacks into gc.garbage]

[Neil Schemenauer]
> Agreed.

Good!  I haven't worked with you for a year -- let's party <wink>.

> It sucks to have to make things a lot more inconvenient just
> because it's theoretically possible for people to make the
> system behave badly.

I don't know how it happened, but sometime over the last few years I've
switched from thinking "well, ya, they could do that, but no real code
would" to "if they can do that, they will -- and especially if they're
hostile".  I didn't even have to take a job at Elemental Security to enjoy
this personality adjustment <wink>.

>> Another scheme is to just run all the weakref callbacks associated
>> with trash cycles, without tp_clear'ing anything first.  Then run
>> gc again to figure out what's still trash, and repeat until no
>> more weakref callbacks in trash cycles exist.

> Repeatedly running the GC sounds like trouble to me.

Me too.

> I think it would be better to move everything reachable from them
> into the youngest generation, finish the GC pass and then run them.
> I haven't been thinking about this as hard as you have though, so
> perhaps I'm missing some subtlety.

That's essentially what my SF patch does, but with a maddeningly wrong idea
for "them" (in "move everything reachable from them").  I think it could be
repaired by computing the objects reachable from the callbacks (instead of
computing the objects reachable from the objects *with* callbacks).  That
gets hairier, though, and there's one more thing ...

> I have to wonder if anyone would care if __del__ methods were
> one-shot as well.  As a user, I would rather have one-shot __del__
> methods and not have to deal with gc.garbage.

Are you sure?  All Java programmers I've heard talk about it say that
finalizers in Java are so bloody useless they don't use them at all.  Maybe
that's a good thing.  Part of the problem is that the order of finalization
isn't defined, and a program that appears to run fine under testing can fail
horribly in real life when the conditions feeding gc change a bit and
provoke a different order of finalization.  That's the primary reason I was
loathe to run __del__ methods in an arbitrary order:  horrid order-dependent
bugs can easily escape non-exhaustive testing, and there's no feasible way
for the user to provoke all N! ways of running N finalizers in a cycle even
if they want to get exhaustive.

For that reason, I'm growing increasingly fond of the idea of clearing the
trash weakrefs first.  If no callbacks get invoked, the order they're not
invoked in probably doesn't matter <wink>.  The technical hangup with that
one right now is that clearing a weakref decrefs the callback, which can
make the callback object die, and the callback object can itself have a
weakref (with a different callback) pointing to *it*.  In that case,
arbitrary Python code gets executed during gc, and in an arbitrary order
again.  There must be a hack to worm around that.

> It would be nice if we could treat both kinds of finalizers consistently.
> Unfortunately I can't think of a way of noting that the __del__ method
> was already run.

One bit in the object would be enough.  Alas, that "one bit" turns out to be
4 bytes, and I've lost count of how many useful one-bit flags we've failed
to add over the years to fear of losing those bytes for the first time.

> I suppose if __del__ method continued to work the way they do,
> people could just use weakref callbacks to do finalization.

If they can ensure the weakref outlives the object, maybe.  Another barrier
is that the weakref callback doesn't expose the object that died:  it's
presumed to already be trash, and, in the absence of trash cycles, *is*
already trash by the time the callback is invoked.  So getting at "self" is
a puzzle for a weakref callback pointing at self.  A binding for self can be
installed as a default argument for the callback, but then that self appears
in the function object keeps self alive for as long as the callback is
alive!  Then the only way for self to go away is for the whole shebang to
vanish in a trash cycle.

So finalization of an object isn't what Python's weakref callbacks were
aiming at, and it's a real strain to use them for that.  Python callbacks
were designed to let other objects know that a given object went away;
that's what weak dicts need to know, for example.


From nas-python at python.ca  Mon Nov 17 18:35:02 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Mon Nov 17 18:32:51 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEBPHDAB.tim@zope.com>
References: <20031117214645.GA23186@mems-exchange.org>
	<LNBBLJKPBEHFEDALKOLCOEBPHDAB.tim@zope.com>
Message-ID: <20031117233502.GA23672@mems-exchange.org>

On Mon, Nov 17, 2003 at 05:57:06PM -0500, Tim Peters wrote:
> That's the primary reason I was loathe to run __del__ methods in
> an arbitrary order:  horrid order-dependent bugs can easily escape
> non-exhaustive testing

Very good point.  I had forgotten about that issue.

> For that reason, I'm growing increasingly fond of the idea of clearing the
> trash weakrefs first.  If no callbacks get invoked, the order they're not
> invoked in probably doesn't matter <wink>.  The technical hangup with that
> one right now is that clearing a weakref decrefs the callback, which can
> make the callback object die, and the callback object can itself have a
> weakref (with a different callback) pointing to *it*.  In that case,
> arbitrary Python code gets executed during gc, and in an arbitrary order
> again.  There must be a hack to worm around that.

A hack you say?  Create a list the references itself (i.e. append
itself).  Append all the unreachable callbacks to it and remove them
from the weakrefs.  Put the list in the youngest generation.  The
next gc should clean it up.

  Neil

From greg at cosc.canterbury.ac.nz  Mon Nov 17 19:53:20 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 17 19:53:36 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBCHDAB.tim@zope.com>
Message-ID: <200311180053.hAI0rKM07940@oma.cosc.canterbury.ac.nz>

Tim Peters <tim@zope.com>:

> The other plausible idea is fixing the glitch with the simpler-at-first "do
> tp_clear on trash weakref objects first" idea.  The problem with that is
> that doing tp_clear on a weakref (or proxy) object ends up decref'ing the
> callback, and the callback may *itself* have a weak reference to it, so that
> decref'ing the callback triggers a different callback, and again arbitrary
> Python code starts running in the middle of gc.

If the second weakref is from inside the cycle, it's callback doesn't
need to be called, by the same reasoning that applies to the first
one.

If the second weakref is from outside the cycle, its callback can't
reach anything inside the cycle by strong refs, otherwise the cycle
wouldn't be garbage. So calling its callback can safely be deferred
until after the cycle has been torn down.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From greg at cosc.canterbury.ac.nz  Mon Nov 17 20:21:45 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 17 20:21:59 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEBPHDAB.tim@zope.com>
Message-ID: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz>

Tim Peters <tim@zope.com>:

> So getting at "self" is a puzzle for a weakref callback pointing at
> self.

How often does a finalizer really *need* access to the entire object
that triggered the finalization, and not just some part of its state?

I remember reading once about the finalization scheme used in a
particular Smalltalk implementation (I think it was ParcPlace) in
which an object requiring finalization registers another object to be
notified after it has died.

This seems to be more or less equivalent to what we have with weakref
callbacks. It might be worth studying how they deal with reference
cycles in their system, since the same solution may well apply to us.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From nas-python at python.ca  Mon Nov 17 21:11:24 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Mon Nov 17 21:09:15 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz>
References: <LNBBLJKPBEHFEDALKOLCOEBPHDAB.tim@zope.com>
	<200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz>
Message-ID: <20031118021124.GA24070@mems-exchange.org>

On Tue, Nov 18, 2003 at 02:21:45PM +1300, Greg Ewing wrote:
> I remember reading once about the finalization scheme used in a
> particular Smalltalk implementation (I think it was ParcPlace) in
> which an object requiring finalization registers another object to be
> notified after it has died.

I think that may be called "guardians".

  Neil

From tim.one at comcast.net  Mon Nov 17 21:45:23 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Nov 17 21:45:20 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCMECPHDAB.tim.one@comcast.net>

[Greg Ewing]
> How often does a finalizer really *need* access to the entire object
> that triggered the finalization, and not just some part of its state?
>
> I remember reading once about the finalization scheme used in a
> particular Smalltalk implementation (I think it was ParcPlace) in
> which an object requiring finalization registers another object to be
> notified after it has died.
>
> This seems to be more or less equivalent to what we have with weakref
> callbacks. It might be worth studying how they deal with reference
> cycles in their system, since the same solution may well apply to us.

It appears that most (not all) Smalltalks pass a shallow copy of "self" to
the method registered for finalization, and the original self is truly
unreachable then.

If you've got time for research, go for it.  My experience is that you
generally can't get answers to such obscure questions without studying
source code, and then the precise answers depend on accidental
implementation details.  Takes a long time, and I don't have it.

Here are "The Rules" for Dolphin Smalltalk, which has "real" finalization
and weak references.  Good luck <wink>:


The Rules

The co-ordination and initiation of Finalization and the murdering of Weak
References are the responsibility of the memory manager, and are performed
during a garbage collection (GC) cycle, according to the following rules
(you may want to skip this advanced topic):

Any objects which are directly reachable down a chain of strong references
from the "roots of the world" will survive the GC, and will NOT be queued
for finalization.

Any objects which are NOT directly reachable by following a chain of strong
references from one of the roots of the world, are candidates for
finalization during a particular GC cycle.

Any weakly referencing objects which contain finalization candidates
identified as above, are candidates for a bereavement notification following
the GC cycle, and will have their pointers to those candidates changed to
pointers to the corpse object during this GC cycle, regardless of whether
those objects are actually queued for finalization during this GC cycle.

Any weakling which has suffered one or more bereavements during a GC cycle
which is also a member of a class marked with the mourning special behaviour
bit (termed a mourning weakling), will receive an #elementsExpired: message
telling them how many of such losses the garbage collector inflicted on
them.

A bereavement notification candidate will only actually be queued for such a
notification if it is a member of a class bearing the mourning special
behaviour mark (applied by sending the class #makeMourner).

Mourning weaklings queued for bereavement notifications will receive an
#elementsExpired: message before any of the objects they previously
referenced has actually been finalized. This is ordering is necessary in
order that when objects are queued for finalization, they do not have any
non-circular references, strong or weak, because a pre-condition for
finalization is that an object must be about to expire.

A mouring weakling which has suffered bereavements during a GC cyle, but
which would otherwise be garbage collected itself, is rescued until after it
has been sent an #elementsExpired: message. If such object still have no
references after processing the #elementsExpired: message, then they will be
garbage collected as normal.

A finalization candidate will only actually be queued for finalization if it
bears the finalization mark (applied by sending #beFinalizable).

Should a finalization candidate contain other finalizable objects, then even
if those contained finalizable objects are only strongly referenced from the
original finalization candidate, then they will not be finalized during the
current GC cycle, but will instead survive until at least the completion of
the containers #finalize (and probably until the next full GC cycle is
complete, should they be circularly referenced). This guarantees that when
an object is finalized, any objects which it "owns" (directly or indirectly)
will not yet have been finalized, and should therefore be in a valid state.

Where a finalizable object, call it A, references another finalizable
object, call it B, then B is guaranteed to be finalized before A. Indeed A
cannot be finalized until B has been finalized.

Where a circular reference exists between two finalizable objects, then the
order in which those objects are actually finalized is undefined (though
they will not be finalized in the same cycle). An example of where such a
situation might arise is where there is a finalizable parent which strongly
references all its children, and those children are finalizable and have a
back pointer to the parent. Although conceptually their is a parent-child
relationship, there is no way for the memory manager to determine which
should be finalized first (indeed it is not necessarily clear). Where this
is the case, #finalize methods must coded defensively, and not depend on
ordering.

Any object in the finalization queue which is not actively being finalized
will have no other references in the image.

You may be wondering why these complex rules are necessary, why not just
finalize every candidate marked as requiring finalization? Well, the rules
are designed to ensure that objects queued for finalization remain valid
until their finalization is complete. If we simply queued every candidate
for finalization, then we could not guarantee that constituent objects had
not already been finalized. This would make coding #finalize methods
horribly complicated and confusing.

Bereavement notifications are not sent to all weaklings by default, because
the necessity of rescuing GC'able weak objects to receive the notification
can potentially extend the lifetime of large groups of weak objects
referenced by other weak objects (e.g. weak tree nodes) due to a "cascading
rescue" effect. Cascading rescues significantly degrade the operation of the
system because they may prevent garbage weaklings from being collected for
many many GC cycles.

The memory manager must ensure that an object does not receive a #finalize
message until there are no strong references to it (which are not circular),
and we need to take account of strong references from objects which are
queued for finalization in the same garbage collection cycle. Even if an
object to be finalized is only referenced from another object to be
finalized in the same cycle, we must delay its finalization until the next
cycle, so that parents are finalized before children, otherwise the parent
may not be in a valid state during its finalize. It is not acceptable to
have the order of finalization depend purely on the ordering the objects are
visited during garbage collection.

Where a finalizable object is circularly referenced (perhaps indirectly), we
must ensure that it can be garbage collected - so this precludes simply
marking any candidates for finalization, and then only actually finalising
those which are unreferenced, because this would mean that circularly
referencing finalizable objects (phew!) would never be garbage collected. In
fact it is possible that an indirect circular reference could exist between
two finalizable objects, and where this is the case there is no general
mechanism for deciding which to finalize first, since there is no notion of
ownership.

This complexity is probably one of the reasons that some other Smalltalks do
not support finalization of objects directly. They have only weak references
and implement finalization with it: Any object which is not directly
reachable through a strong pointer chain is garbage collected, and any weak
references are "nilled". The weakly referencing objects which suffer
bereavements, are informed, and it is up to them to perform finalization
actions on behalf of the objects that they have lost. This is typically
achieved by having a copy of the finalizable objects, and using them as
'executors'. This approach makes garbage collection simpler, but is
inefficient and requries more complex Smalltalk classes to support it.
Furthermore, it does not address the finalization ordering problem. If you
want to implement such finalization in Dolphin, you do so quite easily using
mourning weak objects, because the Dolphin facilities are a superset.


From tim.one at comcast.net  Mon Nov 17 22:09:17 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Nov 17 22:09:07 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <20031117233502.GA23672@mems-exchange.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEDDHDAB.tim.one@comcast.net>

[Neil Schemenauer]
> A hack you say?  Create a list the references itself (i.e. append
> itself).  Append all the unreachable callbacks to it and remove them
> from the weakrefs.  Put the list in the youngest generation.  The
> next gc should clean it up.

Alas, we don't know we can get enough space for a list, and if we can't
we're stuck.  Maybe Py_FatalError would be OK then, but I'd rather not.  I
think I can abuse the weakref objects themselves to hold "the list", though.
Heh.  Now *that's* a hack <wink>.


From tismer at tismer.com  Mon Nov 17 23:05:02 2003
From: tismer at tismer.com (Christian Tismer)
Date: Mon Nov 17 23:05:10 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
Message-ID: <3FB99A6E.5070000@tismer.com>

Hi again, again!

After hours of investigating why my instance method __reduce__
doesn't work, I found out the following:

instancemethod_getattro

does this:

	if (PyType_HasFeature(tp, Py_TPFLAGS_HAVE_CLASS)) {
		if (tp->tp_dict == NULL) {
			if (PyType_Ready(tp) < 0)
				return NULL;
		}
		descr = _PyType_Lookup(tp, name);
	}

	f = NULL;
	if (descr != NULL) {
		f = TP_DESCR_GET(descr->ob_type);
		if (f != NULL && PyDescr_IsData(descr))
			return f(descr, obj, (PyObject *)obj->ob_type);
	}

Why, please can someone explain, why does it ask for
PyDescr_IsData ???
I think this is wrong.
I'm defining an __reduce__ method, and it doesn't provide
a tp_descr_set, as defined in...

int
PyDescr_IsData(PyObject *d)
{
	return d->ob_type->tp_descr_set != NULL;
}

but for what reason is this required???

This thingie is going wrong both in Py 2.2.3 and in Py 2.3.2,
so I guess there is something very basically going wrong.
I'd like to fix that, but I need to understand what the intent of
this code has been.

Can somebody, perhaps the author, explain why this is this way?

thanks so much -- chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From guido at python.org  Tue Nov 18 01:04:20 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 18 01:04:36 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Tue, 18 Nov 2003 05:05:02 +0100."
	<3FB99A6E.5070000@tismer.com> 
References: <3FB99A6E.5070000@tismer.com> 
Message-ID: <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>

> instancemethod_getattro
> 
> does this:
> 
> 	if (PyType_HasFeature(tp, Py_TPFLAGS_HAVE_CLASS)) {
> 		if (tp->tp_dict == NULL) {
> 			if (PyType_Ready(tp) < 0)
> 				return NULL;
> 		}
> 		descr = _PyType_Lookup(tp, name);
> 	}
> 
> 	f = NULL;
> 	if (descr != NULL) {
> 		f = TP_DESCR_GET(descr->ob_type);
> 		if (f != NULL && PyDescr_IsData(descr))
> 			return f(descr, obj, (PyObject *)obj->ob_type);
> 	}
> 
> [...] why does it ask for PyDescr_IsData ???

It's the general pattern: a data descriptor on the class can override
an attribute on the instance, but a method descriptor cannot.  You'll
find this in PyObject_Generic{Get,Set}Attr() too, and in type_getattro().

This is so that if you define a method in a class, you can override it
by setting an instance variable of the same name; this was always
possible for classic classes and I don't see why it shouldn't work for
new-style classes.  But it should also be possible to put a descriptor
on the class that takes complete control.

The case you quote is about delegating bound method attributes to
function attributes, but the same reasoning applies generally, I would
think: unless the descriptor is a data descriptor, the function
attribute should have precedence, IOW a function attribute should be
able to override a method on a bound instance.

Here's an example of the difference:

class C:
    def f(s): pass
    f.__repr__ = lambda: "42"
print C().f.__repr__()

This prints "42".  If you comment out the PyDescr_IsData() call, it
will print "<bound method C.f of <__main__.C instance at 0x...>>".

I'm not entirely clear what goes wrong in your case.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Nov 18 01:08:58 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Tue Nov 18 01:09:12 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <3FB94FD7.1030508@tismer.com>
References: <3FB94FD7.1030508@tismer.com>
Message-ID: <m34qx2cio5.fsf@mira.informatik.hu-berlin.de>

Christian Tismer <tismer@tismer.com> writes:

> So I have the impression these methods loose their
> relationship to their originating object.
> Is this behavior by intent, i.e. is it impossible to write
> a working __reduce__ method for a bound class method?

I don't think it is impossible; see also python.org/sf/558238 

However, I would make pickling of bound methods "built-in", i.e. by
pickle explicitly recognizing bound methods, or using copy_reg, as
Konrad suggests.

If you really want to use __reduce__, you probably have to make sure
it isn't delegated to the function object.

Regards,
Martin

From tim.one at comcast.net  Tue Nov 18 01:28:16 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Nov 18 01:28:08 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOHHCAB.tim@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEEAHDAB.tim.one@comcast.net>

There's a new version of the patch at:

    http://www.python.org/sf/843455

trying to force the "do tp_clear() on trash weakrefs first" idea to work.

All the tests we've discussed here survive it fine (including the ones that
broke the first patch, and there are corresponding unittests for all these
cases in the new patch), but there are several combinations of extreme
endcase complication that haven't yet been tested (except in my head).

I haven't yet been able to convince myself that the following does or does
not have a slow memory leak after the patch (this can be hard to tell on
Win9x!  the system allocator is so strange):

"""
import gc, weakref

def boom(ignore):
    print 'boom'

while 1:
    class C(object):
        def callback(self, ignore):
            self.k

    class D(C):
        pass

    class E(object):
        def __del__(self):
            print 'del',

    c1, c2 = C(), D()
    c1.wr = weakref.ref(c2, c1.callback)
    c2.wr = weakref.ref(c1, c2.callback)
    c1.c = c2
    c2.c = c1
    C.objs = [c1, c2]
    C.wr = weakref.ref(D, boom)
    D.wr = weakref.ref(E, boom)
    C.E = E()
    print '.',
    assert gc.garbage == []
"""

Try that under 2.3.2 instead, and it will eventually segfault -- but not as
soon as you expect!  It typically goes thru about 8 rounds of gc on my box
before it blows up -- it may be a memory corruption bug there.


From tim at zope.com  Tue Nov 18 10:18:48 2003
From: tim at zope.com (Tim Peters)
Date: Tue Nov 18 10:19:13 2003
Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEEAHDAB.tim.one@comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEGCHDAB.tim@zope.com>

[Tim, on <http://www.python.org/sf/843455?]
> ...
> I haven't yet been able to convince myself that the following does or
> does not have a slow memory leak after the patch ...

It doesn't -- when I got up today, it was still chugging along, and was
using less memory than when I went to sleep.  If it weren't for that I was
running it on a Win98SE box, we could conclude that cyclic trash is now
collected faster than it's generated <wink>.


From tim at zope.com  Tue Nov 18 11:39:36 2003
From: tim at zope.com (Tim Peters)
Date: Tue Nov 18 11:41:28 2003
Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEJAHBAB.tim@zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGKHDAB.tim@zope.com>

[Barry Warsaw, from last week]
> When Python's shutting down, will there /be/ another GC invocation?

This doesn't appear to be an issue in the current version of the patch.
Nothing is systemically delayed until "the next" GC invocation anymore.
Weakref callbacks triggered *by* a weakref callback going away are
excruciatingly suppressed until near the end of a gc run under the patch,
but they're allowed to trigger before gc returns.  That may create more
cyclic trash, which won't be discovered before the next gc invocation, but
that would have been true even if the callbacks-on-callbacks hadn't been
temporarily suppressed (i.e., it was already that way).


From raymond.hettinger at verizon.net  Tue Nov 18 16:17:09 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Tue Nov 18 16:17:39 2003
Subject: [Python-Dev] Removing operator.isMappingType
Message-ID: <000201c3ae19$53de5140$a4b82c81@oemcomputer>

My previous posting on this didn't get resolved.  
This issue is that the function doesn't work:
>>> map(operator.isMappingType, ['', u'', (), [], {}])
[True, True, True, True, True]
 
If someone thinks this should not be removed, please speak up.
 
 
Raymond Hettinger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031118/24fb088f/attachment.html
From raymond.hettinger at verizon.net  Tue Nov 18 16:50:17 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Tue Nov 18 16:50:52 2003
Subject: [Python-Dev] __reversed__ protocol
Message-ID: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer>

At one point, PEP 322 had proposed checking to see if an object defined
__reversed__ and if not available, then proceeding normally using
__getitem__ and __len__.  While the idea had supporters, it got taken
out because Guido worried that it would be abused by being applied to
general iterables like generators and objects returned by itertools.

So, an improved version of the idea is to check for __reversed__ but
only use it when the object also defines __len__.  That precludes the
abuses but leaves the protocol open for the normal use cases.  The
simple patch is listed below.

Guido doesn't have time for this now and asked me to present it to you
guys.  What do you guys think?


Raymond


diff -c -r1.10 enumobject.c
*** enumobject.c        7 Nov 2003 15:38:08 -0000       1.10
--- enumobject.c        18 Nov 2003 21:39:51 -0000
***************
*** 174,181 ****
        if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq))
                return NULL;

!       /* Special case optimization for xrange and lists */
!       if (PyRange_Check(seq) || PyList_Check(seq))
                return PyObject_CallMethod(seq, "__reversed__", NULL);

        if (!PySequence_Check(seq)) {
--- 174,181 ----
        if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq))
                return NULL;

!       if (PyObject_HasAttrString(seq, "__reversed__") &&
!           PyObject_HasAttrString(seq, "__len__"))
                return PyObject_CallMethod(seq, "__reversed__", NULL);

        if (!PySequence_Check(seq)) {


From bac at OCF.Berkeley.EDU  Tue Nov 18 17:07:27 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Tue Nov 18 17:07:25 2003
Subject: [Python-Dev] __reversed__ protocol
In-Reply-To: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer>
References: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer>
Message-ID: <3FBA981F.1040908@ocf.berkeley.edu>

Raymond Hettinger wrote:
> At one point, PEP 322 had proposed checking to see if an object defined
> __reversed__ and if not available, then proceeding normally using
> __getitem__ and __len__.  While the idea had supporters, it got taken
> out because Guido worried that it would be abused by being applied to
> general iterables like generators and objects returned by itertools.
> 
> So, an improved version of the idea is to check for __reversed__ but
> only use it when the object also defines __len__.  That precludes the
> abuses but leaves the protocol open for the normal use cases.  The
> simple patch is listed below.
> 
> Guido doesn't have time for this now and asked me to present it to you
> guys.  What do you guys think?
> 

With 'reversed' now a built-in, it seems reasonable to have some magic 
method support for it.  Then again it does add one more thing to have to 
  be aware of that is not necessarily needed.

As for the solution in terms of the problem, I think it is a great way 
to handle it.  It should cause people to think more about supporting 
__reversed__ then had they not had to define __len__.

So, with 'reversed' in the language, I am +0 on adding this with a 
slight leaning toward +1 if I come across a personal need for 'reversed' 
itself.

-Brett


From fincher.8 at osu.edu  Tue Nov 18 18:10:38 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Tue Nov 18 17:12:51 2003
Subject: [Python-Dev] __reversed__ protocol
In-Reply-To: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer>
References: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer>
Message-ID: <200311181810.38561.fincher.8@osu.edu>

On Tuesday 18 November 2003 04:50 pm, Raymond Hettinger wrote:
> Guido doesn't have time for this now and asked me to present it to you
> guys.  What do you guys think?

I think it's a great idea.

Jeremy

From guido at python.org  Tue Nov 18 18:06:05 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 18 18:07:55 2003
Subject: [Python-Dev] __reversed__ protocol
In-Reply-To: Your message of "Tue, 18 Nov 2003 16:50:17 EST."
	<000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> 
References: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> 
Message-ID: <200311182306.hAIN65t13220@c-24-5-183-134.client.comcast.net>

> diff -c -r1.10 enumobject.c
> *** enumobject.c        7 Nov 2003 15:38:08 -0000       1.10
> --- enumobject.c        18 Nov 2003 21:39:51 -0000
> ***************
> *** 174,181 ****
>         if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq))
>                 return NULL;
> 
> !       /* Special case optimization for xrange and lists */
> !       if (PyRange_Check(seq) || PyList_Check(seq))
>                 return PyObject_CallMethod(seq, "__reversed__", NULL);
> 
>         if (!PySequence_Check(seq)) {
> --- 174,181 ----
>         if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq))
>                 return NULL;
> 
> !       if (PyObject_HasAttrString(seq, "__reversed__") &&
> !           PyObject_HasAttrString(seq, "__len__"))
>                 return PyObject_CallMethod(seq, "__reversed__", NULL);
> 
>         if (!PySequence_Check(seq)) {

Note that the two HasAttrString calls can be quite a bit more
expensive than the PyRange_Check and PyList_Check calls...

--Guido van Rossum (home page: http://www.python.org/~guido/)

From wade at treyarch.com  Tue Nov 18 19:12:13 2003
From: wade at treyarch.com (Wade Brainerd)
Date: Tue Nov 18 19:12:22 2003
Subject: [Python-Dev] generator/microthread syntax
Message-ID: <3FBAB55D.5070807@treyarch.com>

Hello, I'm working on a game engine using Python as the scripting 
language and have a question about generators. 

I'm using what I guess are called 'microthreads' as my basic script 
building block, and I'd like to know if there is some kind of syntax 
that could make them clearer, either something in Python already or 
something that could be added.

Here's an example script that illustrates the problem.

from jthe import *

def darlene_ai(self):
    while True:
        for x in wait_until_near(player.po.w,self.po.w): yield None

        begin_cutscene(self)

        for x in wait_face_each_other(player.po,self.po): yield None

        if not player.inventory.has_key("papers"):
            for x in say("Hi, I'm Darlene!  I found these papers,\ndid 
you lose them?"): yield None
        else:
            for x in say("Hey, I'm new to this town, wanna go out 
sometime?"): yield None

        end_cutscene(self)

        if not player.inventory.has_key("papers"):
            spawn(give_item("papers"))

        for x in wait(2.5): yield None

Now in our in-house script language the above code would look very 
similar, only without the

for x in <call>: yield None

constructs.  Instead, subroutines with a wait_ prefix execute yield 
statements which are automatically propogated up the call stack all the 
way to the thread manager. 

Is there anything to be done about this in Python?  I can see it 
implemented three ways:

1. A new declaration for the caller function.  yield statements 
propogate up the call stack automatically until the first 
non-microthread function is found.

microthread darlene_ai(self):
    ...

2. A special kind of exception.  The wait_ function throws an exception 
containing the current execution context, which is caught by the thread 
manager and then later resumed.  Generators would not be used at all.

3. A new yield-like keyword, which assumes that the argument is a 
generator and whose definition is to return the result of 
argument.next() until it catches a StopIteration exception, at which 
point it continues.  This is just shorthand for the for loop, and would 
look something like:

def darlene_ai(self):
    while True:
         wait until_near(player.po.w,self.po.w)

Anyway, thanks for your time, and for the amazing language and modules.

-Wade


From greg at cosc.canterbury.ac.nz  Tue Nov 18 19:27:59 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue Nov 18 19:28:05 2003
Subject: [Python-Dev] generator/microthread syntax
In-Reply-To: <3FBAB55D.5070807@treyarch.com>
Message-ID: <200311190027.hAJ0Rxb17547@oma.cosc.canterbury.ac.nz>

Wade Brainerd <wade@treyarch.com>:

> Instead, subroutines with a wait_ prefix execute yield 
> statements which are automatically propogated up the call stack all the 
> way to the thread manager. 
> 
> Is there anything to be done about this in Python?

Python generators aren't really designed for use as general-purpose
coroutines, and trying to use them as such is messy.

You might like to investigate Stackless Python, which has real
microthreads that *are* designed for the sort of thing you're
doing.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From pje at telecommunity.com  Tue Nov 18 19:42:55 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Nov 18 19:43:10 2003
Subject: [Python-Dev] generator/microthread syntax
In-Reply-To: <3FBAB55D.5070807@treyarch.com>
Message-ID: <5.1.1.6.0.20031118192603.032e9cb0@telecommunity.com>

At 04:12 PM 11/18/03 -0800, Wade Brainerd wrote:
>Hello, I'm working on a game engine using Python as the scripting language 
>and have a question about generators.
>I'm using what I guess are called 'microthreads' as my basic script 
>building block, and I'd like to know if there is some kind of syntax that 
>could make them clearer, either something in Python already or something 
>that could be added.
>
>Here's an example script that illustrates the problem.
>
>from jthe import *
>
>def darlene_ai(self):
>    while True:
>        for x in wait_until_near(player.po.w,self.po.w): yield None
>
>        begin_cutscene(self)
>
>        for x in wait_face_each_other(player.po,self.po): yield None
>
>        if not player.inventory.has_key("papers"):
>            for x in say("Hi, I'm Darlene!  I found these papers,\ndid you 
> lose them?"): yield None
>        else:
>            for x in say("Hey, I'm new to this town, wanna go out 
> sometime?"): yield None
>
>        end_cutscene(self)
>
>        if not player.inventory.has_key("papers"):
>            spawn(give_item("papers"))
>
>        for x in wait(2.5): yield None
>
>Now in our in-house script language the above code would look very 
>similar, only without the
>
>for x in <call>: yield None
>
>constructs.  Instead, subroutines with a wait_ prefix execute yield 
>statements which are automatically propogated up the call stack all the 
>way to the thread manager.
>Is there anything to be done about this in Python?  I can see it 
>implemented three ways:

Since you don't seem to be using the values yielded, how about doing this 
instead:

while True:
     yield wait_until_near(...)
     begin_cutscene(self)
     yield wait_face_each_other(player.po,self.po)
     ...

All you need to do is change your microthread scheduler so that when a 
microthread yields a generator-iterator, you push the current microthread 
onto a stack, and replace it with the yielded generator.  Whenever a 
generator raises StopIteration, you pop the stack it's associated with and 
resume that generator.

This will produce the desired behavior without any language changes.  Your 
scheduler might look like:


class Scheduler:

     def __init__(self):
         self.threads = []

     def spawn(self,thread):
         stack = [thread]
         threads.append(stack)

     def __iter__(self):

         while True:

             for thread in self.threads:

                 current = thread[-1]

                 try:
                     step = current.next()
                 except StopIteration:
                     # Current generator is finished, remove it
                     # and give the next thread a chance
                     thread.pop()
                     if not thread:
                         self.threads.remove(thread)
                     yield None
                     continue

                 try:
                     # Is the yielded result iterable?
                     new = iter(step)
                 except TypeError:
                     # No, skip it
                     yield None
                     continue

                 # Yes, push it on the thread's call stack
                 thread.append(new)


So, to use this, you would do, e.g:

scheduler = Scheduler()
runOnce = iter(scheduler).next

scheduler.spawn( whatever.darlene_ai() )

while True:
     runOnce()
     # do between-quanta activities


All this is untested, so use at your own risk.


From pje at telecommunity.com  Tue Nov 18 19:46:51 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Nov 18 19:46:57 2003
Subject: [Python-Dev] generator/microthread syntax
In-Reply-To: <5.1.1.6.0.20031118192603.032e9cb0@telecommunity.com>
References: <3FBAB55D.5070807@treyarch.com>
Message-ID: <5.1.1.6.0.20031118194459.032ea3e0@telecommunity.com>

At 07:42 PM 11/18/03 -0500, Phillip J. Eby wrote:
>     def spawn(self,thread):
>         stack = [thread]
>         threads.append(stack)

Oops.  That should've been 'self.threads.append(stack)'.  Told you it was 
untested.  :)

There's one other bug, too.  The 'while True' loop in the __iter__ method 
really should be 'while self.threads', or else it'll go into an infinite 
loop when all microthreads have terminated.


From tismer at tismer.com  Tue Nov 18 20:27:00 2003
From: tismer at tismer.com (Christian Tismer)
Date: Tue Nov 18 20:27:06 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
Message-ID: <3FBAC6E4.2020202@tismer.com>

Guido van Rossum wrote:

...

> Here's an example of the difference:
> 
> class C:
>     def f(s): pass
>     f.__repr__ = lambda: "42"
> print C().f.__repr__()
> 
> This prints "42".  If you comment out the PyDescr_IsData() call, it
> will print "<bound method C.f of <__main__.C instance at 0x...>>".
> 
> I'm not entirely clear what goes wrong in your case.

Well, in my case, I try to pickle a bound method, so
I expect that C().f.__reduce__ gives me a reasonable
object: A method of an instance of C that is able to
do an __reduce__, that is, I need the bound f and try
to get its __reduce__ in a bound way.

If that's not the way to do it, which is it?

thanks - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From guido at python.org  Tue Nov 18 20:33:04 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 18 20:33:12 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Wed, 19 Nov 2003 02:27:00 +0100."
	<3FBAC6E4.2020202@tismer.com> 
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> 
	<3FBAC6E4.2020202@tismer.com> 
Message-ID: <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>

> > Here's an example of the difference:
> > 
> > class C:
> >     def f(s): pass
> >     f.__repr__ = lambda: "42"
> > print C().f.__repr__()
> > 
> > This prints "42".  If you comment out the PyDescr_IsData() call, it
> > will print "<bound method C.f of <__main__.C instance at 0x...>>".
> > 
> > I'm not entirely clear what goes wrong in your case.
> 
> Well, in my case, I try to pickle a bound method, so

Um, my brain just did a double-take.  Standard Python doesn't let you
do that, so you must be changing some internals.  Which parts of
Python are you trying to change and which parts are you trying to keep
unchanged?  If you were using a different metaclass you could just
create a different implementation of instancemethod that does what you
want, so apparently you're not going that route.  (With new-style
classes, instancemethod isn't that special any more -- it's just a
currying construct with some extra baggage.)

> I expect that C().f.__reduce__ gives me a reasonable
> object: A method of an instance of C that is able to
> do an __reduce__, that is, I need the bound f and try
> to get its __reduce__ in a bound way.

Try again.  I don't think that C().f.__reduce__ should be a method of
an instance of C.  You want it to be a method of a bound method
object, right?

> If that's not the way to do it, which is it?

I think what I suggested above -- forget about the existing
instancemethod implementation.  But I really don't understand the
context in which you are doing this well enough to give you advice,
and in any context that I understand the whole construct doesn't make
sense. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)

From tismer at tismer.com  Tue Nov 18 20:34:17 2003
From: tismer at tismer.com (Christian Tismer)
Date: Tue Nov 18 20:34:21 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
References: <3FB94FD7.1030508@tismer.com>
	<m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3FBAC899.8090206@tismer.com>

Martin v. L?wis wrote:

> Christian Tismer <tismer@tismer.com> writes:
> 
> 
>>So I have the impression these methods loose their
>>relationship to their originating object.
>>Is this behavior by intent, i.e. is it impossible to write
>>a working __reduce__ method for a bound class method?
> 
> 
> I don't think it is impossible; see also python.org/sf/558238 

will look ito this.

> However, I would make pickling of bound methods "built-in", i.e. by
> pickle explicitly recognizing bound methods, or using copy_reg, as
> Konrad suggests.

I tried to avoid messing with pickle, since I think it
should get a complete, nonrecursive rewrite, ASAP.
Not by me, btw. Or maybe... :-)

> If you really want to use __reduce__, you probably have to make sure
> it isn't delegated to the function object.

I'm quite tempted to special-case __reduce__ since this is
very very simple. And I already spent way too much time
into pickling, because I believe this is a Python feature,
not a Stackless one.
If you have a nice and quick solution, please let me know.
I'm not so very keen on finding the best way possible. The
fact is, that I implemented pickling, and now I hear people
complaining about its imperfectness. Gosh, I was so happy
that it works at all.
So it there is anything I would like to get rid of (and to
move it into core Python), then it is pickling!

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From python at rcn.com  Tue Nov 18 20:44:13 2003
From: python at rcn.com (Raymond Hettinger)
Date: Tue Nov 18 20:46:39 2003
Subject: [Python-Dev] __reversed__ protocol
In-Reply-To: <200311182306.hAIN65t13220@c-24-5-183-134.client.comcast.net>
Message-ID: <001301c3ae3e$a41bda40$a4b82c81@oemcomputer>

> Note that the two HasAttrString calls can be quite a bit more
> expensive than the PyRange_Check and PyList_Check calls...

Right!  So we need to keep those:

	if (PyRange_Check(seq) || PyList_Check(seq) ||
	    PyObject_HasAttrString(seq, "__reversed__") && 
	    PyObject_HasAttrString(seq, "__len__"))
		return PyObject_CallMethod(seq, "__reversed__", NULL);


Raymond


From tismer at tismer.com  Tue Nov 18 20:50:07 2003
From: tismer at tismer.com (Christian Tismer)
Date: Tue Nov 18 20:50:39 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
Message-ID: <3FBACC4F.7090404@tismer.com>

Hi Guido,

...
> Um, my brain just did a double-take.  Standard Python doesn't let you
> do that, so you must be changing some internals.  Which parts of
> Python are you trying to change and which parts are you trying to keep
> unchanged?  If you were using a different metaclass you could just
> create a different implementation of instancemethod that does what you
> want, so apparently you're not going that route.  (With new-style
> classes, instancemethod isn't that special any more -- it's just a
> currying construct with some extra baggage.)

No no no, I'm not fiddling around with any internals, here.
I just want to use the machinary as it is, and to be able to
pickle almost everything.

So, if somebody did a v=C().x, I have that variable around.
In order to pickle it, I ask for its __reduce__, or in other
words, I don't ask for it, I try to supply it, so the pickling
engine can find it.
My expectation is that C().x.__reduce__ gives me the bound
__reduce__ method of the bound x method of a C instance.

...

> Try again.  I don't think that C().f.__reduce__ should be a method of
> an instance of C.  You want it to be a method of a bound method
> object, right?

No, __reduce__ is a method of f, which is bound to an instance
of C. Calling it will give me what I need to pickle the bound
f method. This is all what I want. I think this is just natural.

>>If that's not the way to do it, which is it?
> 
> 
> I think what I suggested above -- forget about the existing
> instancemethod implementation.  But I really don't understand the
> context in which you are doing this well enough to give you advice,
> and in any context that I understand the whole construct doesn't make
> sense. :-(

Once again.
What I try to achieve is complete thread pickling.
That means, I need to supply pickling methods to
all objects which don't have builtin support in
cPickle or which don't provide __reduce__ already.
I have done this for some 10 or more types, successfully.
Bound PyCFunction objects are nice and don't give me a problem.
Bound PyFunction objects do give me a problem, since they
don't want to give me what they are bound to.

My options are:
- Do an ugly patch that special cases for __reduce__, which I did
   just now, in order to seet hings working.
- get the master's voice about how to do this generally right,
   and do it generally right.

I would of course prefer the latter, but I also try to save
as much time as I can while supporting my clients, since
Stackless is almost no longer sponsored, and I have money problems.

thanks so much -- chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tismer at tismer.com  Tue Nov 18 21:02:24 2003
From: tismer at tismer.com (Christian Tismer)
Date: Tue Nov 18 21:02:29 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
References: <3FB94FD7.1030508@tismer.com>
	<m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3FBACF30.7000201@tismer.com>

Martin v. L?wis wrote:

> Christian Tismer <tismer@tismer.com> writes:
> 
> 
>>So I have the impression these methods loose their
>>relationship to their originating object.
>>Is this behavior by intent, i.e. is it impossible to write
>>a working __reduce__ method for a bound class method?
> 
> 
> I don't think it is impossible; see also python.org/sf/558238 
> 
> However, I would make pickling of bound methods "built-in", i.e. by
> pickle explicitly recognizing bound methods, or using copy_reg, as
> Konrad suggests.

Eh, ich seh ?berhaupt nix von Konrad?
Where did he post his messages?

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tismer at tismer.com  Tue Nov 18 21:09:46 2003
From: tismer at tismer.com (Christian Tismer)
Date: Tue Nov 18 21:09:49 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
References: <3FB94FD7.1030508@tismer.com>
	<m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3FBAD0EA.9080604@tismer.com>

Martin v. L?wis wrote:

> Christian Tismer <tismer@tismer.com> writes:
> 
> 
>>So I have the impression these methods loose their
>>relationship to their originating object.
>>Is this behavior by intent, i.e. is it impossible to write
>>a working __reduce__ method for a bound class method?
> 
> 
> I don't think it is impossible; see also python.org/sf/558238 
> 
> However, I would make pickling of bound methods "built-in", i.e. by
> pickle explicitly recognizing bound methods, or using copy_reg, as
> Konrad suggests.

Oh, I see.
My strategy was to avoid copy_reg at all, and to make everything
using C constructs from the beginning.
Maybe this was not so efficient.
I agree (and have checked) that Konrad's solution works.
Maybe I should go that way.
On the other hand, I don't agree that it should be impossible with
the __reduce__ protocol. There is possible some construct
missing, which allows to ask the object machinery the right question.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From eppstein at ics.uci.edu  Tue Nov 18 22:23:59 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Tue Nov 18 22:24:10 2003
Subject: [Python-Dev] 2.2=>2.3 object.__setattr__(cls,attr,value)
Message-ID: <eppstein-E0C55A.19235918112003@sea.gmane.org>

In 2.2 I was able to call object.__setattr__(cls,attr,value)
where cls is a new-style type (first argument of a classmethod),
and attr and value are the name and value of a class attribute I want to 
create programmatically.  I just upgraded to 2.3 but now when I try it I 
get

>>> class foo(object):pass
... 
>>> object.__setattr__(foo,'foo',None)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: can't apply this __setattr__ to type object

Instead I apparently have to call
>>> type(foo).__setattr__(foo,'foo',None)


Anyway, my question: no harm done here because this was in undeployed 
code and I've found a workaround, but shouldn't this have at least been 
mentioned in "What's New in Python 2.3"?  Or maybe this is one of the 
some-other-change-with-far-reaching-consequences things that was 
mentioned and I just don't see the connection?

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From jeremy at alum.mit.edu  Tue Nov 18 23:07:22 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue Nov 18 23:09:53 2003
Subject: [Python-Dev] 2.2=>2.3 object.__setattr__(cls,attr,value)
In-Reply-To: <eppstein-E0C55A.19235918112003@sea.gmane.org>
References: <eppstein-E0C55A.19235918112003@sea.gmane.org>
Message-ID: <1069214841.6983.59.camel@localhost.localdomain>

On Tue, 2003-11-18 at 22:23, David Eppstein wrote:
> In 2.2 I was able to call object.__setattr__(cls,attr,value)
> where cls is a new-style type (first argument of a classmethod),
> and attr and value are the name and value of a class attribute I want to 
> create programmatically.  I just upgraded to 2.3 but now when I try it I 
> get
> 
> >>> class foo(object):pass
> ... 
> >>> object.__setattr__(foo,'foo',None)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: can't apply this __setattr__ to type object
> 
> Instead I apparently have to call
> >>> type(foo).__setattr__(foo,'foo',None)
> 
> 
> Anyway, my question: no harm done here because this was in undeployed 
> code and I've found a workaround, but shouldn't this have at least been 
> mentioned in "What's New in Python 2.3"?  Or maybe this is one of the 
> some-other-change-with-far-reaching-consequences things that was 
> mentioned and I just don't see the connection?

The change was reported on python-dev, but apparently got left out of
the NEWS file.  Here are the details:
http://mail.python.org/pipermail/python-dev/2003-April/034605.html

I don't know that it does much good to change NEWS after the fact, but I
don't think there's anything more that can be done.

Jeremy


From eppstein at ics.uci.edu  Tue Nov 18 23:27:26 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Tue Nov 18 23:27:34 2003
Subject: [Python-Dev] Re: 2.2=>2.3 object.__setattr__(cls,attr,value)
References: <eppstein-E0C55A.19235918112003@sea.gmane.org>
	<1069214841.6983.59.camel@localhost.localdomain>
Message-ID: <eppstein-CFC707.20272618112003@sea.gmane.org>

In article <1069214841.6983.59.camel@localhost.localdomain>,
 Jeremy Hylton <jeremy@alum.mit.edu> wrote:

> The change was reported on python-dev, but apparently got left out of
> the NEWS file.  Here are the details:
> http://mail.python.org/pipermail/python-dev/2003-April/034605.html

Thanks!  Now that you mention it, I vaguely remember something of that 
discussion.  But the messages there seem to be mostly or entirely about 
preventing __setattr__ on built-in types (justifiably called "evil" in 
the thread) while the code I needed this for was to do it on my own 
types.  Was there some other discussion about preventing 
object.__setattr__ on non-builtins or was this just an unintended 
consequence?  Not that it matters much now, it's done...

Of course, all of this has led me to realize that my code was 
unnecessarily obscure: I should have just used setattr(cls,...)

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From guido at python.org  Tue Nov 18 23:50:33 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 18 23:50:41 2003
Subject: [Python-Dev] 2.2=>2.3 object.__setattr__(cls,attr,value)
In-Reply-To: Your message of "Tue, 18 Nov 2003 19:23:59 PST."
	<eppstein-E0C55A.19235918112003@sea.gmane.org> 
References: <eppstein-E0C55A.19235918112003@sea.gmane.org> 
Message-ID: <200311190450.hAJ4oXs13602@c-24-5-183-134.client.comcast.net>

> In 2.2 I was able to call object.__setattr__(cls,attr,value)
> where cls is a new-style type (first argument of a classmethod),
> and attr and value are the name and value of a class attribute I want to 
> create programmatically.  I just upgraded to 2.3 but now when I try it I 
> get
> 
> >>> class foo(object):pass
> ... 
> >>> object.__setattr__(foo,'foo',None)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: can't apply this __setattr__ to type object
> 
> Instead I apparently have to call
> >>> type(foo).__setattr__(foo,'foo',None)
> 
> 
> Anyway, my question: no harm done here because this was in undeployed 
> code and I've found a workaround, but shouldn't this have at least been 
> mentioned in "What's New in Python 2.3"?  Or maybe this is one of the 
> some-other-change-with-far-reaching-consequences things that was 
> mentioned and I just don't see the connection?

I think this was a side effect of closing a hole that allowed using
object.__setattr__ to set attributes on built-in classes.  A quick
look didn't reveal anything in NEWS, but the 2.3 NEWS file is truly
huge, so it may be there. :-(  Andrew Kuchling's "What's New" doesn't
claim completeness...

I think this was fixed in a later version of 2.2 too BTW.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Nov 18 23:57:51 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 18 23:57:59 2003
Subject: [Python-Dev] Re: 2.2=>2.3 object.__setattr__(cls,attr,value)
In-Reply-To: Your message of "Tue, 18 Nov 2003 20:27:26 PST."
	<eppstein-CFC707.20272618112003@sea.gmane.org> 
References: <eppstein-E0C55A.19235918112003@sea.gmane.org>
	<1069214841.6983.59.camel@localhost.localdomain> 
	<eppstein-CFC707.20272618112003@sea.gmane.org> 
Message-ID: <200311190457.hAJ4vpg13650@c-24-5-183-134.client.comcast.net>

> In article <1069214841.6983.59.camel@localhost.localdomain>,
>  Jeremy Hylton <jeremy@alum.mit.edu> wrote:
> 
> > The change was reported on python-dev, but apparently got left out of
> > the NEWS file.  Here are the details:
> > http://mail.python.org/pipermail/python-dev/2003-April/034605.html

[Good sleuthing, Jeremy!]

> Thanks!  Now that you mention it, I vaguely remember something of that 
> discussion.  But the messages there seem to be mostly or entirely about 
> preventing __setattr__ on built-in types (justifiably called "evil" in 
> the thread) while the code I needed this for was to do it on my own 
> types.  Was there some other discussion about preventing 
> object.__setattr__ on non-builtins or was this just an unintended 
> consequence?  Not that it matters much now, it's done...

Blame it on Carlo Verre. :-)

The fix requires that whenever a built-in type derived from object
overrides __setattr__, you cannot call object.__setattr__ directly,
but must use the more derived built-in type's __setattr__.  This is
reasonable IMO, and is now enforced in 2.2.x as well.

> Of course, all of this has led me to realize that my code was 
> unnecessarily obscure: I should have just used setattr(cls,...)

:-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From jeremy at alum.mit.edu  Wed Nov 19 00:02:39 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed Nov 19 00:06:46 2003
Subject: [Python-Dev] Re: 2.2=>2.3 object.__setattr__(cls,attr,value)
In-Reply-To: <200311190457.hAJ4vpg13650@c-24-5-183-134.client.comcast.net>
References: <eppstein-E0C55A.19235918112003@sea.gmane.org>
	<1069214841.6983.59.camel@localhost.localdomain>
	<eppstein-CFC707.20272618112003@sea.gmane.org>
	<200311190457.hAJ4vpg13650@c-24-5-183-134.client.comcast.net>
Message-ID: <1069218159.6983.83.camel@localhost.localdomain>

On Tue, 2003-11-18 at 23:57, Guido van Rossum wrote:
> > In article <1069214841.6983.59.camel@localhost.localdomain>,
> >  Jeremy Hylton <jeremy@alum.mit.edu> wrote:
> > 
> > > The change was reported on python-dev, but apparently got left out of
> > > the NEWS file.  Here are the details:
> > > http://mail.python.org/pipermail/python-dev/2003-April/034605.html
> 
> [Good sleuthing, Jeremy!]

Tricks of the master sleuth revealed:
http://www.google.com/search?q=object.__setattr__

Jeremy


From guido at python.org  Wed Nov 19 00:07:05 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Nov 19 00:07:14 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Wed, 19 Nov 2003 02:50:07 +0100."
	<3FBACC4F.7090404@tismer.com> 
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> 
	<3FBACC4F.7090404@tismer.com> 
Message-ID: <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>

> Hi Guido,
> 
> ...
> > Um, my brain just did a double-take.  Standard Python doesn't let you
> > do that, so you must be changing some internals.  Which parts of
> > Python are you trying to change and which parts are you trying to keep
> > unchanged?  If you were using a different metaclass you could just
> > create a different implementation of instancemethod that does what you
> > want, so apparently you're not going that route.  (With new-style
> > classes, instancemethod isn't that special any more -- it's just a
> > currying construct with some extra baggage.)
> 
> No no no, I'm not fiddling around with any internals, here.
> I just want to use the machinary as it is, and to be able to
> pickle almost everything.
> 
> So, if somebody did a v=C().x, I have that variable around.
> In order to pickle it, I ask for its __reduce__, or in other
> words, I don't ask for it, I try to supply it, so the pickling
> engine can find it.

But how, I wonder, are you providing it?  You can't subclass
instancemethod -- how do you manage to add a __reduce__ method to it
without fiddling with any internals?

> My expectation is that C().x.__reduce__ gives me the bound
> __reduce__ method of the bound x method of a C instance.

Yes, unfortunately you get the __reduce__ method of the unbound
function instead.

I think Martin is right: copy_reg may be your last hope.  (Or
subclassing pickle to special-case instancemethod.)

The pickling machinery wasn't intended to pickle bound methods or
functions etc., and doesn't particularly go out of its way to allow
you to add that functionality.

> ...
> 
> > Try again.  I don't think that C().f.__reduce__ should be a method of
> > an instance of C.  You want it to be a method of a bound method
> > object, right?
> 
> No, __reduce__ is a method of f, which is bound to an instance
> of C. Calling it will give me what I need to pickle the bound
> f method. This is all what I want. I think this is just natural.

And it would be except for the delegation of method attributes to
function attributes.  It is a similar aliasing problem as you see when
you try to access the __getattr__ implementation for classes as
C.__getattr__ -- you get the __getattr__ for C instances instead.  So
you have to use type(C).__getattr__ instead.  That would work for
__reduce__ too I think: new.instancemethod.__reduce__(C().f).

> >>If that's not the way to do it, which is it?
> > 
> > 
> > I think what I suggested above -- forget about the existing
> > instancemethod implementation.  But I really don't understand the
> > context in which you are doing this well enough to give you advice,
> > and in any context that I understand the whole construct doesn't make
> > sense. :-(
> 
> Once again.
> What I try to achieve is complete thread pickling.
> That means, I need to supply pickling methods to
> all objects which don't have builtin support in
> cPickle or which don't provide __reduce__ already.
> I have done this for some 10 or more types, successfully.
> Bound PyCFunction objects are nice and don't give me a problem.
> Bound PyFunction objects do give me a problem, since they
> don't want to give me what they are bound to.

OK, so you *are* messing with internals after all (== changing C
code), right?  Or else how do you accomplish this?

> My options are:
> - Do an ugly patch that special cases for __reduce__, which I did
>    just now, in order to seet hings working.
> - get the master's voice about how to do this generally right,
>    and do it generally right.
> 
> I would of course prefer the latter, but I also try to save
> as much time as I can while supporting my clients, since
> Stackless is almost no longer sponsored, and I have money problems.

I have a real job too, that's why I have little time to help you. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Nov 19 02:13:33 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Wed Nov 19 02:16:05 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <3FBAC899.8090206@tismer.com>
References: <3FB94FD7.1030508@tismer.com>
	<m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
	<3FBAC899.8090206@tismer.com>
Message-ID: <m3n0asn84i.fsf@mira.informatik.hu-berlin.de>

Christian Tismer <tismer@tismer.com> writes:

> If you have a nice and quick solution, please let me know.

Install something in copy_reg. Nice and quick.

Regards,
Martin

From tommy at ilm.com  Wed Nov 19 20:19:53 2003
From: tommy at ilm.com (Tommy Burnette)
Date: Wed Nov 19 20:20:03 2003
Subject: [Python-Dev] airspeed of an unladen swallow
Message-ID: <16316.5817.601214.578299@evoke.lucasdigital.com>

in case this hasn't been seen on the regular python list yet....

http://www.style.org/unladenswallow


From tismer at tismer.com  Wed Nov 19 22:18:46 2003
From: tismer at tismer.com (Christian Tismer)
Date: Wed Nov 19 22:18:52 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
	<3FBACC4F.7090404@tismer.com>
	<200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>
Message-ID: <3FBC3296.1090004@tismer.com>

Hi again, Guido,

>>No no no, I'm not fiddling around with any internals, here.
>>I just want to use the machinary as it is, and to be able to
>>pickle almost everything.

Sorry, this was a lie.
Sure I'm fiddling internaly, but simply by
installing some __reduce__ methids, hoping that
they work.
This worked most of the time, but I'm having problems
with bound methods.

> But how, I wonder, are you providing it?  You can't subclass
> instancemethod -- how do you manage to add a __reduce__ method to it
> without fiddling with any internals?

I added __reduce__ to the PyMethod type and tried to figure out
why it didn't take it.

>>My expectation is that C().x.__reduce__ gives me the bound
>>__reduce__ method of the bound x method of a C instance.
> 
> 
> Yes, unfortunately you get the __reduce__ method of the unbound
> function instead.
> 
> I think Martin is right: copy_reg may be your last hope.  (Or
> subclassing pickle to special-case instancemethod.)

Well, I see your point, but please let me explain mine, again:
If there is a class C which has a method x, then C().x is
a perfectly fine expression, yielding a bound method.

If I now like to pickle this expression, I would use the
__reduce__ protocol and ask C().x for its __reduce__ property.

Now, please see that __reduce__ has no parameters, i.e. it has
no other chance to do the right thing(TM) but by relying
on to be bound to the right thing.
So, doesn't it make sense to have __reduce__ to be always returned
as a method of some bound anything?

In other words, shouldn't things that are only useful as bound
things, always be bound?

> The pickling machinery wasn't intended to pickle bound methods or
> functions etc., and doesn't particularly go out of its way to allow
> you to add that functionality.

The pickling machinery gives me an __reduce__ interface, and I'm
expecting that this is able to pickle everything.

...

> And it would be except for the delegation of method attributes to
> function attributes.  It is a similar aliasing problem as you see when
> you try to access the __getattr__ implementation for classes as
> C.__getattr__ -- you get the __getattr__ for C instances instead.  So
> you have to use type(C).__getattr__ instead.  That would work for
> __reduce__ too I think: new.instancemethod.__reduce__(C().f).

I agree!
But I can't do this in this context, using __reduce__ only.
In other words, I'd have to add stuff to copyreg.py, which
I tried to circumvent.

...

> OK, so you *are* messing with internals after all (== changing C
> code), right?  Or else how do you accomplish this?

Yessir, I'm augmenting all things-to-be-pickled with __reduce__
methods. And this time is the first time that it doesn't work.

...

> I have a real job too, that's why I have little time to help you. :-(

I agree (and I didn't ask *you* in the first place), but still
I'd like to ask the general question:
Is this really the right way to handle bound objects?
Is the is_data criterion correct?
If I am asking for an attribute that makes *only* sense if it is
bound, like in the parameter-less __reduce__ case, wouldn't
it be the correct behavior to give me that bound object?

I have the strong impression that there is some difference
in methods which isn't dealt with, correctly, at the moment.
If a method wants to be bound to something, it should be
get bound to something.
Especially, if this method is useless without being bound.
Please, swallow this idea a little bit, before rejecting
it. I think that "is_data" is too rough and doesn't fit
the requirements, all the time.

sincerely -- chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tismer at tismer.com  Wed Nov 19 22:19:55 2003
From: tismer at tismer.com (Christian Tismer)
Date: Wed Nov 19 22:19:58 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <m3n0asn84i.fsf@mira.informatik.hu-berlin.de>
References: <3FB94FD7.1030508@tismer.com>	<m34qx2cio5.fsf@mira.informatik.hu-berlin.de>	<3FBAC899.8090206@tismer.com>
	<m3n0asn84i.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3FBC32DB.2010607@tismer.com>

Martin v. L?wis wrote:

> Christian Tismer <tismer@tismer.com> writes:
> 
> 
>>If you have a nice and quick solution, please let me know.
> 
> 
> Install something in copy_reg. Nice and quick.

Gack! probably my only chance, without starting a major flame war.
But I know it *is* wrong.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From pje at telecommunity.com  Wed Nov 19 23:23:42 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Nov 19 23:22:12 2003
Subject: [Python-Dev] more on pickling
In-Reply-To: <3FBC32DB.2010607@tismer.com>
References: <m3n0asn84i.fsf@mira.informatik.hu-berlin.de>
	<3FB94FD7.1030508@tismer.com>
	<m34qx2cio5.fsf@mira.informatik.hu-berlin.de>
	<3FBAC899.8090206@tismer.com>
	<m3n0asn84i.fsf@mira.informatik.hu-berlin.de>
Message-ID: <5.1.0.14.0.20031119232053.025e78d0@mail.telecommunity.com>

At 04:19 AM 11/20/03 +0100, Christian Tismer wrote:
>Martin v. L?wis wrote:
>
>>Christian Tismer <tismer@tismer.com> writes:
>>
>>>If you have a nice and quick solution, please let me know.
>>
>>Install something in copy_reg. Nice and quick.
>
>Gack! probably my only chance, without starting a major flame war.
>But I know it *is* wrong.

Not according to the documentation:

"""The copy_reg module provides support for the pickle and cPickle 
modules....
It provides configuration information about object constructors which are 
not classes."""

Hmm.  Maybe that last bit should actually say "object types that do not 
support __reduce__ or other pickling protocols", now that everything's a 
class.  Other than that, it seems dead on to what you're trying to do.


From guido at python.org  Thu Nov 20 01:18:45 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov 20 01:18:58 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Thu, 20 Nov 2003 04:18:46 +0100."
	<3FBC3296.1090004@tismer.com> 
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
	<3FBACC4F.7090404@tismer.com>
	<200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> 
	<3FBC3296.1090004@tismer.com> 
Message-ID: <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net>

Summary: Chistian is right after all.  instancemethod_getattro should
always prefer bound method attributes over function attributes.

> >>No no no, I'm not fiddling around with any internals, here.
> >>I just want to use the machinary as it is, and to be able to
> >>pickle almost everything.
> 
> Sorry, this was a lie.

Sigh.

OK, you're forgiven.

> Sure I'm fiddling internaly, but simply by
> installing some __reduce__ methids, hoping that
> they work.

OK, so you *could* just make the change you want, but you are asking
why it isn't like that in the first place.  Good idea...

> This worked most of the time, but I'm having problems
> with bound methods.

We've established that without a doubt, yes. :-)

> > But how, I wonder, are you providing it?  You can't subclass
> > instancemethod -- how do you manage to add a __reduce__ method to it
> > without fiddling with any internals?
> 
> I added __reduce__ to the PyMethod type and tried to figure out
> why it didn't take it.

OK.  Stating that upfront would have helped...

> >>My expectation is that C().x.__reduce__ gives me the bound
> >>__reduce__ method of the bound x method of a C instance.
> > 
> > 
> > Yes, unfortunately you get the __reduce__ method of the unbound
> > function instead.
> > 
> > I think Martin is right: copy_reg may be your last hope.  (Or
> > subclassing pickle to special-case instancemethod.)
> 
> Well, I see your point, but please let me explain mine, again:
> If there is a class C which has a method x, then C().x is
> a perfectly fine expression, yielding a bound method.

Of course.

> If I now like to pickle this expression, I would use the
> __reduce__ protocol and ask C().x for its __reduce__ property.

Which unfortunately gets the __reduce__ property of the underlying
*function* object (also named x) used to implement the method.  This
function can be accessed as C.__dict__['x'].  (Not as C.x, that
returns an unbound method object, which is the same kind of object as
a bound method object but without an instance. :-)

> Now, please see that __reduce__ has no parameters, i.e. it has
> no other chance to do the right thing(TM) but by relying
> on to be bound to the right thing.
> So, doesn't it make sense to have __reduce__ to be always returned
> as a method of some bound anything?
> 
> In other words, shouldn't things that are only useful as bound
> things, always be bound?

This question doesn't address the real issue, which is the attribute
delegation to the underlying function object.

What *should* happen when the same attribute name exists on the
function and on the bound method?  In 2.1, when function attributes
were first introduced, this was easy: a few attributes were special
for the bound method (im_func, im_self, im_class) and for these the
bound method attribute wins (if you set an attribute with one of those
names on the function, you can't access it through the bound method).

The *intention* was for the 2.2 version to have the same behavior:
only im_func, im_self and im_class would be handled by the bound
method, other attributes would be handled by the function object.
This is what the IsData test is attempting to do -- the im_*
attributes are represented by data descriptors now.  The __class__
attribute is also a data descriptor, so that C().x.__class__ gives us
<type 'instancemethod'> rather than <type 'function'>.

But for anything else, including the various methods that all objects
inherit from 'object' unless they override them, the choice was made
to let the function attribute win.

But when we look at the attributes where both function and bound
method provide a value, it seems that the bound method's offering is
always more useful!  You've already established this for __reduce__;
the same is true for __call__ and __str__, and there I stopped.
(Actually, I also looked at __setattr__, where delegation to the
function also seems a mistake: C().x.foo = 42 is refused, but
C().x.__setattr__('foo', 42) sets the attribute on the function,
because this returns the (bound) method __setattr__ on functions.)

> > The pickling machinery wasn't intended to pickle bound methods or
> > functions etc., and doesn't particularly go out of its way to allow
> > you to add that functionality.
> 
> The pickling machinery gives me an __reduce__ interface, and I'm
> expecting that this is able to pickle everything.

I don't think you'd have a chance of pickle classes if you only relied
on __reduce__.  Fortunately there are other mechanisms. :-)

(I wonder if the pickling code shouldn't try to call
x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of
these problems would have occurred... :-)

> ...
> 
> > And it would be except for the delegation of method attributes to
> > function attributes.  It is a similar aliasing problem as you see when
> > you try to access the __getattr__ implementation for classes as
> > C.__getattr__ -- you get the __getattr__ for C instances instead.  So
> > you have to use type(C).__getattr__ instead.  That would work for
> > __reduce__ too I think: new.instancemethod.__reduce__(C().f).
> 
> I agree!
> But I can't do this in this context, using __reduce__ only.
> In other words, I'd have to add stuff to copyreg.py, which
> I tried to circumvent.

Or you could change the pickling system.  Your choice of what to
change and what not to change seems a bit arbitrary. :-)

> ...
> 
> > OK, so you *are* messing with internals after all (== changing C
> > code), right?  Or else how do you accomplish this?
> 
> Yessir, I'm augmenting all things-to-be-pickled with __reduce__
> methods. And this time is the first time that it doesn't work.

But not necessarily the last time. :-)

> ...
> 
> > I have a real job too, that's why I have little time to help you. :-(
> 
> I agree (and I didn't ask *you* in the first place), but still
> I'd like to ask the general question:
> Is this really the right way to handle bound objects?
> Is the is_data criterion correct?
> If I am asking for an attribute that makes *only* sense if it is
> bound, like in the parameter-less __reduce__ case, wouldn't
> it be the correct behavior to give me that bound object?
> 
> I have the strong impression that there is some difference
> in methods which isn't dealt with, correctly, at the moment.
> If a method wants to be bound to something, it should be
> get bound to something.
> Especially, if this method is useless without being bound.

It's not that it isn't being bound.  It's that the *wrong* attribute
is being bound (the function's __reduce__ method, bound to the
function object, is returned!).

> Please, swallow this idea a little bit, before rejecting
> it. I think that "is_data" is too rough and doesn't fit
> the requirements, all the time.

I agree.  The bound method's attributes should always win, since bound
methods only have a small, fixed number of attributes, and they are
all special for bound methods.

This *is* a change in functionality, even though there appear to be no
unit tests for it, so I'm reluctant to fix it in 2.3.  But I think in
2.4 it should definitely change.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From Jack.Jansen at cwi.nl  Thu Nov 20 06:52:11 2003
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Thu Nov 20 06:51:51 2003
Subject: [Python-Dev] Ripping out Macintosh support
Message-ID: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>

As you may have noticed if you follow the checkins mailing list I've 
enthusiastically started ripping out 90% of the work I did on Python 
the last 10 years (and quite a bit of really old code by Guido too:-): 
everything related to support for pre-Mac OS X macintoshes.

Over the last year I've asked various times whether anyone was willing 
to even consider doing support for MacOS9 for 2.4, and I got absolutely 
no replies, not even the usual "I'd love to have it but I can't 
help":-). So out it goes!

I'm trying to be careful that I don't break anything, and I make sure 
the selftests pass every time, but there's always the chance that I do 
get something wrong. So if things suddenly break inexplicably you're 
all free to blame me, initially, until I can point out that I have 
nothing whatsoever to do with the breakage:-)
--
Jack Jansen        <Jack.Jansen@cwi.nl>        http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman


From mwh at python.net  Thu Nov 20 07:06:50 2003
From: mwh at python.net (Michael Hudson)
Date: Thu Nov 20 07:06:55 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib codeop.py,
	1.7, 1.8
In-Reply-To: <E1AMSUp-0007n4-00@sc8-pr-cvs1.sourceforge.net>
	(doerwalter@users.sourceforge.net's
	message of "Wed, 19 Nov 2003 05:35:51 -0800")
References: <E1AMSUp-0007n4-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <2mn0arkzvp.fsf@starship.python.net>

doerwalter@users.sourceforge.net writes:

> Update of /cvsroot/python/python/dist/src/Lib
> In directory sc8-pr-cvs1:/tmp/cvs-serv29941/Lib
>
> Modified Files:
> 	codeop.py 
> Log Message:
> Fix typos.

Uh, no.

>   This module provides two interfaces, broadly similar to the builtin
                         ^^^^^^^^^^^^^^

> ! function compile(), that take progam text, a filename and a 'mode'
                        ^^^^
perhaps this should be which...

Cheers,
mwh

-- 
6. The code definitely is not portable - it will produce incorrect 
   results if run from the surface of Mars.
               -- James Bonfield, http://www.ioccc.org/2000/rince.hint

From walter at livinglogic.de  Thu Nov 20 08:40:00 2003
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu Nov 20 08:40:14 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib codeop.py, 
	1.7, 1.8
In-Reply-To: <2mn0arkzvp.fsf@starship.python.net>
References: <E1AMSUp-0007n4-00@sc8-pr-cvs1.sourceforge.net>
	<2mn0arkzvp.fsf@starship.python.net>
Message-ID: <3FBCC430.4020709@livinglogic.de>

Michael Hudson wrote:

> doerwalter@users.sourceforge.net writes:
> 
> 
>>Update of /cvsroot/python/python/dist/src/Lib
>>In directory sc8-pr-cvs1:/tmp/cvs-serv29941/Lib
>>
>>Modified Files:
>>	codeop.py 
>>Log Message:
>>Fix typos.
> 
> Uh, no.
> 
>>  This module provides two interfaces, broadly similar to the builtin
>                          ^^^^^^^^^^^^^^
> 
>>! function compile(), that take progam text, a filename and a 'mode'
>                         ^^^^
> perhaps this should be which...

This depens on whether "take program text..." refers to compile() or
to "two interfaces".

OK, I've fixed the fix.

Bye,
    Walter D?rwald


From mwh at python.net  Thu Nov 20 08:53:02 2003
From: mwh at python.net (Michael Hudson)
Date: Thu Nov 20 08:55:48 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib codeop.py, 
	1.7, 1.8
In-Reply-To: <3FBCC430.4020709@livinglogic.de>
References: <E1AMSUp-0007n4-00@sc8-pr-cvs1.sourceforge.net>	<2mn0arkzvp.fsf@starship.python.net>
	<3FBCC430.4020709@livinglogic.de>
Message-ID: <m3u14z2lkx.fsf@pc150.maths.bris.ac.uk>

Walter D?rwald <walter@livinglogic.de> writes:

> Michael Hudson wrote:
> 
> > doerwalter@users.sourceforge.net writes:
> >
> >>Update of /cvsroot/python/python/dist/src/Lib
> >>In directory sc8-pr-cvs1:/tmp/cvs-serv29941/Lib
> >>
> >>Modified Files:
> >> 	codeop.py Log Message:
> >>Fix typos.
> > Uh, no.
> >
> >>  This module provides two interfaces, broadly similar to the builtin
> >                          ^^^^^^^^^^^^^^
> >
> >>! function compile(), that take progam text, a filename and a 'mode'
> >                         ^^^^
> > perhaps this should be which...
> 
> This depens on whether "take program text..." refers to compile() or
> to "two interfaces".

Hmm, yes.  Hadn't thought of reading it that way...

> OK, I've fixed the fix.

Thank you!

Cheers,
mwh

-- 
  Q: What are 1000 lawyers at the bottom of the ocean?
  A: A good start.
  (A lawyer told me this joke.)
                                  -- Michael Str?der, comp.lang.python


From skip at pobox.com  Thu Nov 20 10:04:07 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Nov 20 10:04:26 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>
References: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>
Message-ID: <16316.55271.205085.815371@montanaro.dyndns.org>


    Jack> Over the last year I've asked various times whether anyone was
    Jack> willing to even consider doing support for MacOS9 for 2.4, and I
    Jack> got absolutely no replies, not even the usual "I'd love to have it
    Jack> but I can't help":-). So out it goes!

This is maybe too late to ask, but did you create something like a
last-pre-macosx branch before making your changes?  That would allow someone
to easily come back later and do the work.

Someone asked on c.l.py about running Python on OS6 (yes, Six) a few days
ago and Python is maintained by interested individuals on other legacy
platforms like OS/2 and the Amiga, maybe not at the latest and greatest
release, but they're still there.  There's probably someone on the planet
who'd be willing to putter around with Python on MacOS9.  That person just
hasn't been found yet.

Skip

From martin at v.loewis.de  Thu Nov 20 14:43:10 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu Nov 20 14:43:56 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <16316.55271.205085.815371@montanaro.dyndns.org>
References: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>
	<16316.55271.205085.815371@montanaro.dyndns.org>
Message-ID: <m3llqa3jxt.fsf@mira.informatik.hu-berlin.de>

Skip Montanaro <skip@pobox.com> writes:

> Someone asked on c.l.py about running Python on OS6 (yes, Six) a few days
> ago and Python is maintained by interested individuals on other legacy
> platforms like OS/2 and the Amiga, maybe not at the latest and greatest
> release, but they're still there.  There's probably someone on the planet
> who'd be willing to putter around with Python on MacOS9.  That person just
> hasn't been found yet.

I think they could easily start with Python 2.3, though.

Regards,
Martin

From greg at cosc.canterbury.ac.nz  Thu Nov 20 17:32:38 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu Nov 20 17:32:44 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>
Message-ID: <200311202232.hAKMWcY08939@oma.cosc.canterbury.ac.nz>

Jack Jansen <Jack.Jansen@cwi.nl>:

> As you may have noticed if you follow the checkins mailing list I've 
>enthusiastically started ripping out 90% of the work I did on Python 
>the last 10 years

What are you ripping out, exactly? I hope you're not getting rid of
Carbon too soon, because I'm in the midst of doing a Mac version of my
Python GUI using it!

Mind you, the main reason I chose to use Carbon in the first place was
so that there was some chance the same version would work on both 9
and X. But if there's never going to be a Python for MacOS 9 at all,
ever again, maybe I should just give up now and re-do it all using
PyObjC or something?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From tim.one at comcast.net  Thu Nov 20 18:30:15 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Nov 20 18:30:25 2003
Subject: [Python-Dev] Time for 2.3.3?
Message-ID: <LNBBLJKPBEHFEDALKOLCKEFMHEAB.tim.one@comcast.net>

Over the last week, I checked in fixes for two distinct broad causes of
segfaults in code using weakrefs with callbacks.  The bugs have been there
since weakrefs were introduced, but for whatever reason nobody bumped into
them (knowingly) until Jim Fulton and Thomas Heller happened to provoke
both, independently, within a day of each other.  It was especially easy
under Thomas's scenario *not* to get a segfault in a release build, but to
suffer random memory corruption instead (if the double-deallocation provoked
pymalloc into handing out the same chunk of memory to two distinct objects
alive at the same time -- and that is, alas, a likely outcome).

I suspect these bugs hid for so long because it's taken Pythoneers a long
time to discover why weakrefs can be so cool, and start to build serious
apps on top of them.  Casual programmers aren't likely to use weakrefs at
all, but once you've built a cache based on weakrefs in a large app,
weakrefs become critical to your code and your design.

So I think either of these fixes is enough to justify a bugfix release, and
having two of them makes a compelling case.  What say we get 2.3.3 in
motion?  I did the weakref checkins already on the trunk and on
release23-maint; Thomas Heller confirmed that his problems went away on
release23-maint, and Jim Fulton confirmed that his Zope3 segfaults went away
on the released 2.3.2 + a patch identical in all functional respects to what
got checked in (the new test_weakref test cases, and some code comments,
were different).

If we get 2.3.3c1 out in early December, we could release 2.3.3 final before
the end of the year, and start 2004 with a 100% bug-free codebase <wink>.


From anthony at interlink.com.au  Thu Nov 20 19:19:19 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Nov 20 19:19:49 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEFMHEAB.tim.one@comcast.net> 
Message-ID: <200311210019.hAL0JJjH011663@localhost.localdomain>


I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the 
15th of December, and a release around the 22nd?

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tismer at tismer.com  Thu Nov 20 21:45:25 2003
From: tismer at tismer.com (Christian Tismer)
Date: Thu Nov 20 21:45:30 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net>
References: <3FB99A6E.5070000@tismer.com>	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>	<3FBAC6E4.2020202@tismer.com>	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>	<3FBACC4F.7090404@tismer.com>	<200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>
	<3FBC3296.1090004@tismer.com>
	<200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net>
Message-ID: <3FBD7C45.3020607@tismer.com>

Guido van Rossum wrote:

> Summary: Chistian is right after all.  instancemethod_getattro should
> always prefer bound method attributes over function attributes.

Guido, I'm very happy with your decision, which is most
probably a wise decision (without any relation to me).

The point is, that I didn't know what's right or wrong,
so basically I was asking for advice on a thing I felt
unhappy with. So I asked you to re-think if the behavior
is really what you indented, or if you just stopped, early.

Thanks a lot!

That's the summary and all about it, you can skip the rest if you like.

-----------------------------------------------------------------------

...

>>Sure I'm fiddling internaly, but simply by
>>installing some __reduce__ methids, hoping that
>>they work.
> 
> 
> OK, so you *could* just make the change you want, but you are asking
> why it isn't like that in the first place.  Good idea...

I actually hacked a special case for __reduce__, to see whether it
works at all, but then asked, of course. Most of my pickling stuff
might be of general interest, and changing semantics is by
no means what I ever would like to do without following the main path.

...

>>I added __reduce__ to the PyMethod type and tried to figure out
>>why it didn't take it.
> 
> OK.  Stating that upfront would have helped...

Sorry about that. I worked too long on these issues already
and had the perception that everybody knows that I'm patching
__reduce__ into many objects like a bozo :-)

...

>>In other words, shouldn't things that are only useful as bound
>>things, always be bound?
> 
> This question doesn't address the real issue, which is the attribute
> delegation to the underlying function object.

Correct, I misspelled things. Of course there is binding, but
the chain back to the instance is lost.

...

> The *intention* was for the 2.2 version to have the same behavior:
> only im_func, im_self and im_class would be handled by the bound
> method, other attributes would be handled by the function object.

Ooh, I begin to understand!

> This is what the IsData test is attempting to do -- the im_*
> attributes are represented by data descriptors now.  The __class__
> attribute is also a data descriptor, so that C().x.__class__ gives us
> <type 'instancemethod'> rather than <type 'function'>.

IsData is a test for having a write method, too, so we have
the side effect here that im_* works like I expect, since
they happen to be writable?
Well, I didn't look into 2.3 for this, but in 2.2 I get

 >>> a().x.__class__=42
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: __class__ must be set to new-style class, not 'int' object
[9511 refs]
 >>>

which says for sure that this is a writable property, while

 >>> a().x.im_class=42
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: readonly attribute
[9511 refs]
 >>>

seems to be handled differently.

I only thought of IsData in terms of accessing the
getter/setter wrappers.

> But for anything else, including the various methods that all objects
> inherit from 'object' unless they override them, the choice was made
> to let the function attribute win.

That's most probably right to do, since most defaults
from object are probably just surrogates.

> But when we look at the attributes where both function and bound
> method provide a value, it seems that the bound method's offering is
> always more useful!  You've already established this for __reduce__;
> the same is true for __call__ and __str__, and there I stopped.
> (Actually, I also looked at __setattr__, where delegation to the
> function also seems a mistake: C().x.foo = 42 is refused, but
> C().x.__setattr__('foo', 42) sets the attribute on the function,
> because this returns the (bound) method __setattr__ on functions.)

Your examples are much better than mine.

>>The pickling machinery gives me an __reduce__ interface, and I'm
>>expecting that this is able to pickle everything.
> 
> I don't think you'd have a chance of pickle classes if you only relied
> on __reduce__.  Fortunately there are other mechanisms. :-)

I don't need to pickle classes, this works fine in most cases,
and behavior can be modified by users. They can use copy_reg,
and that's one of my reasons to avoid copy_reg. I want to have
the basics built in, without having to import a Python module.

> (I wonder if the pickling code shouldn't try to call
> x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of
> these problems would have occurred... :-)

That sounds reasonable. Explicit would have been better than
implicit (by hoping for the expected bound chain).

__reduce__ as a class method would allow to explicitly spell
that I want to reduce the instance x of class C.

x.__class__.__reduce__(x)

While, in contrast

x.__class__.__reduce__(x.thing)

would spell that I want to reduce the "thing" property of the
x instance of C.

While

x.__class__.__reduce__(C.thing)  # would be the same as
C.__reduce__(C.thing)

which would reduce the class method "thing" of C, or the class
property of C, or whatsoever of class C.

I could envision a small extension to the __reduce__ protocol,
by providing an optional parameter, which would open these
new ways, and all pickling questions could be solved, probably.
This is so, since we can find out whether __reduce__ is a class
method or not.
If it is just an instance method (implictly bound), it behaves as
today.
If it is a class method, is takes a parameter, and then it can find
out whether to pickle a class, instance, class property or an instance
property.

Well, I hope. The above was said while being in bed with 39? Celsius,
so don't put my words on the assay-balance.

[trying to use __reduce__, only]

> Or you could change the pickling system.  Your choice of what to
> change and what not to change seems a bit arbitrary. :-)

Not really.
I found __reduce__ very elegant.
It gave me the chance to have almost all patches in a single
file, since I didn't need to patch most of the implementation
files. Just adding something to the type objects was sufficient,
and this keeps my workload smaller when migrating to the next Python.
Until now, I only had to change traceback.c and iterator.c, since
these don't export enough of their structures to patch things
from outside. If at some point somebody might decide that some of
this support code makes sense for the main distribution, things
should of couzrse move to where they belong.

Adding to copy_reg, well, I don't like to modify Python modules
from C so much, and even less I like to add extra Python files
to Stackless, if I can do without it.

Changing the pickling engine: Well, I'm hesitant, since it has
been developed so much more between 2.2 and 2.3, and I didn't
get my head into that machinery, now.
What I want to do at some time is to change cPickle to use
a non-recursive implementation. (Ironically, the Python pickle
engine *is* non-recursive, if it is run under Stackless).
So, if I would hack at cPickle at all, I would probably do the
big big change, and that would be too much to get done in
reasonable time. That's why I decided to stay small and just
chime a few __reduce__ thingies in, for the time being.
Maybe this was not the best way, I don't know.

>>>OK, so you *are* messing with internals after all (== changing C
>>>code), right?  Or else how do you accomplish this?
>>
>>Yessir, I'm augmenting all things-to-be-pickled with __reduce__
>>methods. And this time is the first time that it doesn't work.
> 
> 
> But not necessarily the last time. :-)

Right. probably, I will get into trouble with pickling
unbound class methods.
Maybe I would just ignore this. Bound class methods do
appear in my Tasklet system and need to get pickled.
Unbound methods are much easier to avoid and probably
not worth the effort. (Yes, tomorrow I will be told
that it *is* :-)

...

> I agree.  The bound method's attributes should always win, since bound
> methods only have a small, fixed number of attributes, and they are
> all special for bound methods.
> 
> This *is* a change in functionality, even though there appear to be no
> unit tests for it, so I'm reluctant to fix it in 2.3.  But I think in
> 2.4 it should definitely change.

That means, for Py 2.2 and 2.3, my current special case for
__reduce__ is exactly the way to go, since it doesn't change any
semantics but for __reduce__, and in 2.4 I just drop these
three lines? Perfect!

sincerely - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tim.one at comcast.net  Fri Nov 21 00:43:08 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 00:43:14 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <200311210019.hAL0JJjH011663@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHAHEAB.tim.one@comcast.net>

[Anthony Baxter]
> I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the
> 15th of December, and a release around the 22nd?

That's good enough for me.  I'd rather push the RC up a week earlier,
though, to give more time for user testing.  Many people take large blocks
of time off around Christmas, and have major extra demands on their time the
week before too (planning and shopping and endless bickering with family --
Christmas is great <wink>).

What else does 2.3.3 need?  IIRC, the sre tests still fail on 2.3 maint, and
that's a showstopper.

I'd like to "do something" about the 2.3 changes to Python finalization that
have provoked new problems, but don't have time.  If nothing else, I'd at
least like to common out the second call to gc in Py_Finalize -- with
hidnsight, that wasn't ready for prime time, and the # of things that can go
wrong when trying to execute Python code after modules (particularly sys)
have been torn down appears boundless.  The only bad thing I've seen come
out of the first call to gc in Py_Finalize is nonsense errors complaining
that Python hasn't been initialized (when a __del__ or weakref callback
triggered then tries to import a new module).

What else does 2.3.3 need?


From anthony at interlink.com.au  Fri Nov 21 00:51:23 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Nov 21 00:59:14 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHAHEAB.tim.one@comcast.net> 
Message-ID: <200311210551.hAL5pOVd015765@localhost.localdomain>


>>> "Tim Peters" wrote
> What else does 2.3.3 need?  IIRC, the sre tests still fail on 2.3 maint, and
> that's a showstopper.

I thought I'd fixed that.

I have a bunch of compatibility fixes that I'd like to work on. I'm also 
considering switching to the newer version of autoconf.


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tim.one at comcast.net  Fri Nov 21 01:41:43 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 01:41:49 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <200311210551.hAL5pOVd015765@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEHGHEAB.tim.one@comcast.net>

>> What else does 2.3.3 need?  IIRC, the sre tests still fail on 2.3
>> maint, and that's a showstopper.

[Anthony Baxter]
> I thought I'd fixed that.

I don't know.  How did they fail?  This is how they fail for me today
(Windows):

C:\Code\23\PCbuild>python ../lib/test/test_re.py
test_anyall (__main__.ReTests) ... ok
test_basic_re_sub (__main__.ReTests) ... ok
test_bigcharset (__main__.ReTests) ... ok
test_bug_113254 (__main__.ReTests) ... ok
test_bug_114660 (__main__.ReTests) ... ok
test_bug_117612 (__main__.ReTests) ... ok
test_bug_418626 (__main__.ReTests) ... ERROR
test_bug_448951 (__main__.ReTests) ... ok
test_bug_449000 (__main__.ReTests) ... ok
test_bug_449964 (__main__.ReTests) ... ok
test_bug_462270 (__main__.ReTests) ... ok
test_bug_527371 (__main__.ReTests) ... ok
test_bug_545855 (__main__.ReTests) ... ok
test_bug_612074 (__main__.ReTests) ... ok
test_bug_725106 (__main__.ReTests) ... ok
test_bug_725149 (__main__.ReTests) ... ok
test_bug_764548 (__main__.ReTests) ... ok
test_category (__main__.ReTests) ... ok
test_constants (__main__.ReTests) ... ok
test_expand (__main__.ReTests) ... ok
test_finditer (__main__.ReTests) ... ok
test_flags (__main__.ReTests) ... ok
test_getattr (__main__.ReTests) ... ok
test_getlower (__main__.ReTests) ... ok
test_groupdict (__main__.ReTests) ... ok
test_ignore_case (__main__.ReTests) ... ok
test_non_consuming (__main__.ReTests) ... ok
test_not_literal (__main__.ReTests) ... ok
test_pickling (__main__.ReTests) ... ok
test_qualified_re_split (__main__.ReTests) ... ok
test_qualified_re_sub (__main__.ReTests) ... ok
test_re_escape (__main__.ReTests) ... ok
test_re_findall (__main__.ReTests) ... ok
test_re_groupref (__main__.ReTests) ... ok
test_re_groupref_exists (__main__.ReTests) ... ok
test_re_match (__main__.ReTests) ... ok
test_re_split (__main__.ReTests) ... ok
test_re_subn (__main__.ReTests) ... ok
test_repeat_minmax (__main__.ReTests) ... ok
test_scanner (__main__.ReTests) ... ok
test_search_coverage (__main__.ReTests) ... ok
test_search_star_plus (__main__.ReTests) ... ok
test_special_escapes (__main__.ReTests) ... ok
test_sre_character_literals (__main__.ReTests) ... ok
test_stack_overflow (__main__.ReTests) ... ERROR
test_symbolic_refs (__main__.ReTests) ... ok

======================================================================
ERROR: test_bug_418626 (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_re.py", line 410, in test_bug_418626
    self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), 20001)
  File "C:\CODE\23\lib\sre.py", line 137, in search
    return _compile(pattern, flags).search(string)
RuntimeError: maximum recursion limit exceeded

======================================================================
ERROR: test_stack_overflow (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_re.py", line 420, in test_stack_overflow
    self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x')
  File "C:\CODE\23\lib\sre.py", line 132, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded

----------------------------------------------------------------------
Ran 46 tests in 0.550s

FAILED (errors=2)


> I have a bunch of compatibility fixes that I'd like to work on. I'm
> also considering switching to the newer version of autoconf.

A newer & buggier version, or a newer & better version <wink>?


From theller at python.net  Fri Nov 21 04:59:11 2003
From: theller at python.net (Thomas Heller)
Date: Fri Nov 21 04:59:28 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHAHEAB.tim.one@comcast.net> (Tim
	Peters's message of "Fri, 21 Nov 2003 00:43:08 -0500")
References: <LNBBLJKPBEHFEDALKOLCKEHAHEAB.tim.one@comcast.net>
Message-ID: <r802f3f4.fsf@python.net>

> [Anthony Baxter]
>> I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the
>> 15th of December, and a release around the 22nd?
>

[Tim]
> That's good enough for me.  I'd rather push the RC up a week earlier,
> though, to give more time for user testing.  Many people take large blocks
> of time off around Christmas, and have major extra demands on their time the
> week before too (planning and shopping and endless bickering with family --
> Christmas is great <wink>).

I'm among those people having extra demands on the time before Christmas
(well, I've got wife and children), so I would prefer to do all this one
week earlier: build the RC around the 8th, and the release around the
15th of december.

Thomas


From mwh at python.net  Fri Nov 21 07:20:05 2003
From: mwh at python.net (Michael Hudson)
Date: Fri Nov 21 07:20:10 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHAHEAB.tim.one@comcast.net> (Tim
	Peters's message of "Fri, 21 Nov 2003 00:43:08 -0500")
References: <LNBBLJKPBEHFEDALKOLCKEHAHEAB.tim.one@comcast.net>
Message-ID: <2mk75toqve.fsf@starship.python.net>

"Tim Peters" <tim.one@comcast.net> writes:

> [Anthony Baxter]
>> I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the
>> 15th of December, and a release around the 22nd?
>
> That's good enough for me.

I'm not expecting to be around much in between those dates, but could
do some work for the RC with those dates.

> What else does 2.3.3 need?  

There are a bunch of build problems which my brain, sadly but not
surprisingly, has thoroughly paged out.

We should give the new autoconf a go, at least.

Cheers,
mwh

-- 
 "Well, the old ones go Mmmmmbbbbzzzzttteeeeeep as they start up and
  the new ones go whupwhupwhupwhooopwhooooopwhooooooommmmmmmmmm."
                         -- Graham Reed explains subway engines on asr

From skip at pobox.com  Fri Nov 21 08:55:40 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Nov 21 08:55:52 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEFMHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCKEFMHEAB.tim.one@comcast.net>
Message-ID: <16318.6492.580944.89131@montanaro.dyndns.org>


    Tim> What say we get 2.3.3 in motion?  

As long as a primary motivator for a 2.3.3 release seems to be
weakref-related, perhaps someone who's familiar enough with their usage
could beef up the docs enough to get rid of this comment at the top of the
module doc:

    XXX -- need to say more here!

I was motivated to take a look at the weakref docs for the first time after
Tim mentioned:

    Casual programmers aren't likely to use weakrefs at all, but once you've
    built a cache based on weakrefs in a large app, weakrefs become critical
    to your code and your design.

Skip

From tim.one at comcast.net  Fri Nov 21 11:16:19 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 11:16:26 2003
Subject: [Python-Dev] test_re failures, Windows, 2.3 maint
Message-ID: <LNBBLJKPBEHFEDALKOLCGEJLHEAB.tim.one@comcast.net>

I sent test_re output from Windows on 2.3 maint yesterday.  Two tests fail
with "maximum recursion limit exceeded".

Why do we expect them not to fail?

32-bit Windows may be unique in using this check:

#if defined(USE_STACKCHECK)
    if (level % 10 == 0 && PyOS_CheckStack())
        return SRE_ERROR_RECURSION_LIMIT;
#endif

PyOS_CheckStack() there isn't guessing, it's using Windows-specific
facilities to check directly whether the C stack is about to overflow.

In test_bug_418626, that check triggers twice, once at level = 15090 and
again at level 15210.

In test_stack_overflow, it triggers once at level 15210.

The test comments appear to believe that sre shouldn't be recursing at all
in these tests, but 15K+ levels is hard to sell as no recursion <wink>.


From niemeyer at conectiva.com  Fri Nov 21 11:22:53 2003
From: niemeyer at conectiva.com (Gustavo Niemeyer)
Date: Fri Nov 21 11:23:21 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEHGHEAB.tim.one@comcast.net>
References: <200311210551.hAL5pOVd015765@localhost.localdomain>
	<LNBBLJKPBEHFEDALKOLCEEHGHEAB.tim.one@comcast.net>
Message-ID: <20031121162253.GA23299@burma.localdomain>

> ======================================================================
> ERROR: test_bug_418626 (__main__.ReTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_re.py", line 410, in test_bug_418626
>     self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), 20001)
>   File "C:\CODE\23\lib\sre.py", line 137, in search
>     return _compile(pattern, flags).search(string)
> RuntimeError: maximum recursion limit exceeded
> 
> ======================================================================
> ERROR: test_stack_overflow (__main__.ReTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_re.py", line 420, in test_stack_overflow
>     self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x')
>   File "C:\CODE\23\lib\sre.py", line 132, in match
>     return _compile(pattern, flags).match(string)
> RuntimeError: maximum recursion limit exceeded
> 
> ----------------------------------------------------------------------
> Ran 46 tests in 0.550s
> 
> FAILED (errors=2)

It looks like someone have backported the changes done in test_re.py.
These tests were expected to fail with the SRE from 2.3.

-- 
Gustavo Niemeyer
http://niemeyer.net

From tim.one at comcast.net  Fri Nov 21 11:48:27 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 11:48:34 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <20031121162253.GA23299@burma.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>

>> ======================================================================
>> ERROR: test_bug_418626 (__main__.ReTests)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "../lib/test/test_re.py", line 410, in test_bug_418626
>>     self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0),
>>   20001) File "C:\CODE\23\lib\sre.py", line 137, in search
>>     return _compile(pattern, flags).search(string)
>> RuntimeError: maximum recursion limit exceeded
>>
>> ======================================================================
>> ERROR: test_stack_overflow (__main__.ReTests)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "../lib/test/test_re.py", line 420, in test_stack_overflow
>>     self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x')
>>   File "C:\CODE\23\lib\sre.py", line 132, in match
>>     return _compile(pattern, flags).match(string)
>> RuntimeError: maximum recursion limit exceeded
>>
>> ----------------------------------------------------------------------
>> Ran 46 tests in 0.550s
>>
>> FAILED (errors=2)

[Gustavo Niemeyer]
> It looks like someone have backported the changes done in test_re.py.
> These tests were expected to fail with the SRE from 2.3.

The tests are never expected to fail, so I think you mean that test_re in
2.3 should expect (and suppress) the RuntimeError in these cases.

It looks like Anthony changed this most recently:

    test_re.py
    Revision 1.45.6.1
    Tue Nov 4 14:11:01 2003 UTC (2 weeks, 3 days ago) by anthonybaxter
    Branch: release23-maint
    Changes since 1.45: +9 -7 lines

    get tests working again. partial backport of 1.46 - I fixed the
    recursive tests that used to fail, but left test_re_groupref_exists
    disabled, as it fails on the release23-maint branch. Maybe something
    else needs to be backported?

We've got more than one problem here, then, because Barry reports that
test_re on release23-maint, as it exists today, does *not* fail on a RedHat
9 build.  So if Anthony reverted that change, test_re would pass again on
Windows, but would start to fail on RH9.


From niemeyer at conectiva.com  Fri Nov 21 11:54:28 2003
From: niemeyer at conectiva.com (Gustavo Niemeyer)
Date: Fri Nov 21 11:54:46 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
References: <20031121162253.GA23299@burma.localdomain>
	<LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
Message-ID: <20031121165428.GA27853@burma.localdomain>

> > It looks like someone have backported the changes done in test_re.py.
> > These tests were expected to fail with the SRE from 2.3.
> 
> The tests are never expected to fail, so I think you mean that test_re in
> 2.3 should expect (and suppress) the RuntimeError in these cases.

Yes, that's what I meant. Sorry for not being clear.

> It looks like Anthony changed this most recently:
> 
>     test_re.py
>     Revision 1.45.6.1
>     Tue Nov 4 14:11:01 2003 UTC (2 weeks, 3 days ago) by anthonybaxter
>     Branch: release23-maint
>     Changes since 1.45: +9 -7 lines
> 
>     get tests working again. partial backport of 1.46 - I fixed the
>     recursive tests that used to fail, but left test_re_groupref_exists
>     disabled, as it fails on the release23-maint branch. Maybe something
>     else needs to be backported?

Yes, he seems to belive that the new SRE scheme was introduced in 2.3,
but these tests should still expect RuntimeError in 2.3.

> We've got more than one problem here, then, because Barry reports that
> test_re on release23-maint, as it exists today, does *not* fail on a
> RedHat 9 build.  So if Anthony reverted that change, test_re would
> pass again on Windows, but would start to fail on RH9.

That's strange indeed. Either other changes were introduced in 2.3
which changed the number of recursions, what I don't belive to be
the case, or the fixed recursion limit was raised in that platform.

-- 
Gustavo Niemeyer
http://niemeyer.net

From mwh at python.net  Fri Nov 21 12:09:41 2003
From: mwh at python.net (Michael Hudson)
Date: Fri Nov 21 12:10:41 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <20031121165428.GA27853@burma.localdomain> (Gustavo Niemeyer's
	message of "Fri, 21 Nov 2003 14:54:28 -0200")
References: <20031121162253.GA23299@burma.localdomain>
	<LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
	<20031121165428.GA27853@burma.localdomain>
Message-ID: <2m7k1todgq.fsf@starship.python.net>

Gustavo Niemeyer <niemeyer@conectiva.com> writes:

> Yes, he seems to belive that the new SRE scheme was introduced in 2.3,
> but these tests should still expect RuntimeError in 2.3.

I was under the impression (and slightly alarmed) that the recursion
removal gimmicks had been backported from the trunk to the
release23-maint branch.  Was that not the case?  (If so, phew).  If
that *wasn't* the case, then why were the tests failing for Anthony
before he made that checkin?

Cheers,
mwh

-- 
  Or here's an even simpler indicator of how much C++ sucks: Print
  out the C++ Public Review Document.  Have someone  hold it about
  three feet  above your head and then drop it.  Thus  you will be
  enlightened.                                        -- Thant Tessman

From barry at python.org  Fri Nov 21 12:22:22 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov 21 12:23:01 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <2m7k1todgq.fsf@starship.python.net>
References: <20031121162253.GA23299@burma.localdomain>
	<LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
	<20031121165428.GA27853@burma.localdomain>
	<2m7k1todgq.fsf@starship.python.net>
Message-ID: <1069435342.2383.69.camel@anthem>

FWIW, I'm having much more problems with 2.3cvs on RH7.3.  test_re.py
core dumps for me for instance.  I'm doing a fresh build --with-pydebug
and will try to get more information.

-Barry


From barry at python.org  Fri Nov 21 13:09:12 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov 21 13:09:28 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
Message-ID: <1069438152.2383.85.camel@anthem>

Never mind.  A fresh debug build, test -u all yields no problems with
2.3cvs on RH7.3 either.

-Barry


From tim.one at comcast.net  Fri Nov 21 13:09:35 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 13:09:42 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069435342.2383.69.camel@anthem>
Message-ID: <LNBBLJKPBEHFEDALKOLCGELFHEAB.tim.one@comcast.net>

[Barry]
> FWIW, I'm having much more problems with 2.3cvs on RH7.3.  test_re.py
> core dumps for me for instance.

For Guido too.

> I'm doing a fresh build --with-pydebug and will try to get more
> information.

It's one of two things:

    USE_RECURSION_LIMIT isn't #define'd

or

    USE_RECURSION_LIMIT is #define'd, but to a value too large for
    that box

There's a maze of #ifdef'ery near the start of _sre.c setting
USE_RECURSION_LIMIT differently for different platforms.  Windows doesn't
use USE_RECURSION_LIMIT -- it uses a different gimmick based on being able
to test for C stack overflow directly on Windows.

test_re.py *should*, at this time, fail in exactly the same ways I reported
it failing on Windows.


From tim.one at comcast.net  Fri Nov 21 13:13:50 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 13:13:53 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069438152.2383.85.camel@anthem>
Message-ID: <LNBBLJKPBEHFEDALKOLCGELMHEAB.tim.one@comcast.net>

[Barry Warsaw]
> Never mind.  A fresh debug build, test -u all yields no problems with
> 2.3cvs on RH7.3 either.

test_re.py isn't supposed to pass on 2.3 maint today.  If it passed, it's
broken, and will start to fail as soon as the breakage is repaired.  Find
out what USE_RECURSION_LIMIT is set to on that box.


From barry at python.org  Fri Nov 21 13:15:58 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov 21 13:16:14 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGELFHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGELFHEAB.tim.one@comcast.net>
Message-ID: <1069438558.2383.91.camel@anthem>

On Fri, 2003-11-21 at 13:09, Tim Peters wrote:

> test_re.py *should*, at this time, fail in exactly the same ways I reported
> it failing on Windows.

Then Something Else is going on.  As reported in another message, it
doesn't fail for me on either RH9 or RH7.3, and a fresh debug build on
RH7.3 also doesn't crash for me either.

-Barry


From barry at python.org  Fri Nov 21 13:43:52 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov 21 13:44:03 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGELMHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGELMHEAB.tim.one@comcast.net>
Message-ID: <1069440231.2383.95.camel@anthem>

On Fri, 2003-11-21 at 13:13, Tim Peters wrote:
> [Barry Warsaw]
> > Never mind.  A fresh debug build, test -u all yields no problems with
> > 2.3cvs on RH7.3 either.
> 
> test_re.py isn't supposed to pass on 2.3 maint today.  If it passed, it's
> broken, and will start to fail as soon as the breakage is repaired.  Find
> out what USE_RECURSION_LIMIT is set to on that box.

Is it possible that USE_RECURSION_LIMIT isn't defined for my RH
builds?!  I added the attached little bit of (seemingly useful) code to
_sre.c, recompiled and then...

% ./python
Python 2.3.3a0 (#4, Nov 21 2003, 13:39:39) 
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import _sre
[24546 refs]
>>> _sre.RECURSION_LIMIT
[24546 refs]
>>> 
[24546 refs]
[7129 refs]

Very odd.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sre-patch.txt
Type: text/x-patch
Size: 682 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20031121/69c981e0/sre-patch.bin
From tim.one at comcast.net  Fri Nov 21 14:22:18 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 14:22:24 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069440231.2383.95.camel@anthem>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMCHEAB.tim.one@comcast.net>

[Barry Warsaw]
> Is it possible that USE_RECURSION_LIMIT isn't defined for my RH
> builds?!

I can't see how:  it's set by a giant maze of #ifdef's, which are almost as
reliable as a giant maze of CVS branches <wink>.

Because the #ifdef's nest 4 deep at one point, and the bodies aren't
indented, it's damned hard to figure out what they're doing by eyeball.  But
I *think* this part:

"""

#else
#define USE_RECURSION_LIMIT 10000
#endif
#endif
#endif
"""

which gives all the appearance of defining a default value (if nothing else
triggers), is actually nested *inside* an

    #elif defined(__FreeBSD__)

block (which is in turn nested in a !defined(USE_STACKCHECK) block, which is
in turn nested in an ifndef SRE_RECURSIVE block).  God only knows what the
intent was.  But I expect that, yes, USE_RECURSION_LIMIT isn't getting
defined on anything other than FreeBSD and Win64.

> I added the attached little bit of (seemingly useful) code
> to _sre.c, recompiled and then...
>
> % ./python
> Python 2.3.3a0 (#4, Nov 21 2003, 13:39:39)
> [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import _sre
> [24546 refs]
> >>> _sre.RECURSION_LIMIT
> [24546 refs]
> >>>
> [24546 refs]
> [7129 refs]
>
> Very odd.

OTOH, if you believe what it says, that leads directly to the cause <wink>.


From barry at python.org  Fri Nov 21 14:36:06 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov 21 14:37:31 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEMCHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCOEMCHEAB.tim.one@comcast.net>
Message-ID: <1069443361.2383.118.camel@anthem>

On Fri, 2003-11-21 at 14:22, Tim Peters wrote:

> block (which is in turn nested in a !defined(USE_STACKCHECK) block, which is
> in turn nested in an ifndef SRE_RECURSIVE block).  God only knows what the
> intent was.  But I expect that, yes, USE_RECURSION_LIMIT isn't getting
> defined on anything other than FreeBSD and Win64.

Yep, you're right.  If I hack _sre.c with the patch below, I think I get
something closer to what we expect to see.

% ./python
Python 2.3.3a0 (#5, Nov 21 2003, 14:26:25) 
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import _sre
[24583 refs]
>>> _sre.RECURSION_LIMIT
10000
[24585 refs]
>>> 
[24585 refs]
[7130 refs]

...

======================================================================
ERROR: test_bug_418626 (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_re.py", line 410, in test_bug_418626
    self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), 20001)
  File "/home/barry/projects/python23/Lib/sre.py", line 137, in search
    return _compile(pattern, flags).search(string)
RuntimeError: maximum recursion limit exceeded

======================================================================
ERROR: test_stack_overflow (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Lib/test/test_re.py", line 420, in test_stack_overflow
    self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x')
  File "/home/barry/projects/python23/Lib/sre.py", line 132, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded

I'll leave it to someone else to check in the proper fix.  (But does
anybody else like exposing RECURSION_LIMIT in the _sre module?)

-Barry


-------------- next part --------------
A non-text attachment was scrubbed...
Name: sre-patch2.txt
Type: text/x-patch
Size: 825 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20031121/86e2a559/sre-patch2.bin
From niemeyer at conectiva.com  Fri Nov 21 14:50:58 2003
From: niemeyer at conectiva.com (Gustavo Niemeyer)
Date: Fri Nov 21 14:51:28 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEMCHEAB.tim.one@comcast.net>
References: <1069440231.2383.95.camel@anthem>
	<LNBBLJKPBEHFEDALKOLCOEMCHEAB.tim.one@comcast.net>
Message-ID: <20031121195057.GA24270@burma.localdomain>

[...]
> which gives all the appearance of defining a default value (if nothing else
> triggers), is actually nested *inside* an
> 
>     #elif defined(__FreeBSD__)
> 
> block (which is in turn nested in a !defined(USE_STACKCHECK) block, which is
> in turn nested in an ifndef SRE_RECURSIVE block).  God only knows what the
> intent was.  But I expect that, yes, USE_RECURSION_LIMIT isn't getting
> defined on anything other than FreeBSD and Win64.

It looks to be this patch's fault:

-------------

From: loewis@users.sourceforge.net
To: python-checkins@python.org
Cc: 
Bcc: 
Subject: [Python-checkins] python/dist/src/Modules _sre.c,2.99,2.99.8.1
Reply-To: python-dev@python.org

Update of /cvsroot/python/python/dist/src/Modules
In directory sc8-pr-cvs1:/tmp/cvs-serv28127

Modified Files:
      Tag: release23-maint
	_sre.c 
Log Message:
Patch #813391: Reduce limits for amd64 and sparc64.


Index: _sre.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/_sre.c,v
retrieving revision 2.99
retrieving revision 2.99.8.1
diff -C2 -d -r2.99 -r2.99.8.1
*** _sre.c	26 Jun 2003 14:41:08 -0000	2.99
--- _sre.c	20 Oct 2003 20:59:45 -0000	2.99.8.1
***************
*** 72,78 ****
  /* FIXME: maybe the limit should be 40000 / sizeof(void*) ? */
  #define USE_RECURSION_LIMIT 7500
- #else
  
! #if defined(__GNUC__) && defined(WITH_THREAD) && defined(__FreeBSD__)
  /* the pthreads library on FreeBSD has a fixed 1MB stack size for the
   * initial (or "primary") thread, which is insufficient for the default
--- 72,83 ----
  /* FIXME: maybe the limit should be 40000 / sizeof(void*) ? */
  #define USE_RECURSION_LIMIT 7500
  
! #elif defined(__FreeBSD__)
! /* FreeBSD/amd64 and /sparc64 require even smaller limits */
! #if defined(__amd64__)
! #define USE_RECURSION_LIMIT 6000
! #elif defined(__sparc64__)
! #define USE_RECURSION_LIMIT 3000
! #elif defined(__GNUC__) && defined(WITH_THREAD)
  /* the pthreads library on FreeBSD has a fixed 1MB stack size for the
   * initial (or "primary") thread, which is insufficient for the default


_______________________________________________
Python-checkins mailing list
Python-checkins@python.org
http://mail.python.org/mailman/listinfo/python-checkins


-- 
Gustavo Niemeyer
http://niemeyer.net

From tim.one at comcast.net  Fri Nov 21 14:56:42 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 14:56:50 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069443361.2383.118.camel@anthem>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEMFHEAB.tim.one@comcast.net>

[Barry]
> Yep, you're right.  If I hack _sre.c with the patch below, I think I
> get something closer to what we expect to see.
>
> % ./python
> Python 2.3.3a0 (#5, Nov 21 2003, 14:26:25)
> [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import _sre
> [24583 refs]
> >>> _sre.RECURSION_LIMIT
> 10000
> [24585 refs]
>
> ...
>
> ======================================================================
> ERROR: test_bug_418626 (__main__.ReTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "Lib/test/test_re.py", line 410, in test_bug_418626
>     self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0),
>   20001) File "/home/barry/projects/python23/Lib/sre.py", line 137,
>     in search return _compile(pattern, flags).search(string)
> RuntimeError: maximum recursion limit exceeded
>
> ======================================================================
> ERROR: test_stack_overflow (__main__.ReTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "Lib/test/test_re.py", line 420, in test_stack_overflow
>     self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x')
>   File "/home/barry/projects/python23/Lib/sre.py", line 132, in match
>     return _compile(pattern, flags).match(string)
> RuntimeError: maximum recursion limit exceeded

Yup, that's how they fail on Windows today, and is how they're *expected* to
fail everywhere today.

> I'll leave it to someone else to check in the proper fix.

I expect Anthony has the best shot at understanding why he did what he did
before, so has the best shot at undoing it too without creating more new
problems.

> (But does anybody else like exposing RECURSION_LIMIT in the
> _sre module?)

For 2.3 maint it would be a new feature, so probably not.

For 2.4, I believe all this code has become a mass of decoys (that is, it's
still there, but is no longer used; I don't know why it hasn't been
deleted) -- Gustavo reworked sre to stop using C-level recursion.

BTW, Gustavo, we get a big pile of compiler warnings on the trunk (2.4
development) in _sre.c now, on Windows, and apparently under
some-but-not-all gcc flavors.  How about cleaning those up?  See:

    http://mail.python.org/pipermail/python-dev/2003-October/039059.html


From niemeyer at conectiva.com  Fri Nov 21 15:00:56 2003
From: niemeyer at conectiva.com (Gustavo Niemeyer)
Date: Fri Nov 21 15:01:46 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEMFHEAB.tim.one@comcast.net>
References: <1069443361.2383.118.camel@anthem>
	<LNBBLJKPBEHFEDALKOLCMEMFHEAB.tim.one@comcast.net>
Message-ID: <20031121200056.GA24592@burma.localdomain>

> For 2.4, I believe all this code has become a mass of decoys (that is,
> it's still there, but is no longer used; I don't know why it hasn't
> been deleted) -- Gustavo reworked sre to stop using C-level recursion.
> 
> BTW, Gustavo, we get a big pile of compiler warnings on the trunk (2.4
> development) in _sre.c now, on Windows, and apparently under
> some-but-not-all gcc flavors.  How about cleaning those up?  See:
> 
>     http://mail.python.org/pipermail/python-dev/2003-October/039059.html

Thanks for pointing me this. I'll manage to clean these issues (the code
and the warnings) ASAP.

-- 
Gustavo Niemeyer
http://niemeyer.net

From barry at python.org  Fri Nov 21 15:17:36 2003
From: barry at python.org (Barry Warsaw)
Date: Fri Nov 21 15:17:49 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEMFHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCMEMFHEAB.tim.one@comcast.net>
Message-ID: <1069445855.2383.122.camel@anthem>

On Fri, 2003-11-21 at 14:56, Tim Peters wrote:

> For 2.3 maint it would be a new feature, so probably not.
> 
> For 2.4, I believe all this code has become a mass of decoys (that is, it's
> still there, but is no longer used; I don't know why it hasn't been
> deleted) -- Gustavo reworked sre to stop using C-level recursion.

Works for me.

or-did-ly y'rs,
-Barry


From tim.one at comcast.net  Fri Nov 21 17:24:01 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 17:24:06 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <16318.6492.580944.89131@montanaro.dyndns.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCEENDHEAB.tim.one@comcast.net>

[Skip Montanaro]
> As long as a primary motivator for a 2.3.3 release seems to be
> weakref-related, perhaps someone who's familiar enough with their
> usage could beef up the docs enough to get rid of this comment at the
> top of the module doc:
>
>     XXX -- need to say more here!

I checked in more words (on the trunk and on 2.3 maint).  Feel free to add
even more <wink>.


From anthony at interlink.com.au  Fri Nov 21 21:23:07 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Nov 21 21:23:33 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEMFHEAB.tim.one@comcast.net> 
Message-ID: <200311220223.hAM2N8E7007850@localhost.localdomain>


>>> "Tim Peters" wrote
> I expect Anthony has the best shot at understanding why he did what he did
> before, so has the best shot at undoing it too without creating more new
> problems.

Sorry - I (and a bunch of other folks, Alex included if I recall correctly)
was seeing a bunch of test failures in test_re - I ported the "fixed" tests
from the trunk, in the assumption that the relevant change had been made to
the branch. I'll undo it, once someone's fixed _sre in the branch to be 
broken again <wink>

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tim.one at comcast.net  Fri Nov 21 22:58:02 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 21 22:58:13 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <200311220223.hAM2N8E7007850@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>

[Anthony Baxter]
> Sorry - I (and a bunch of other folks, Alex included if I recall
> correctly) was seeing a bunch of test failures in test_re - I ported
> the "fixed" tests from the trunk, in the assumption that the relevant
> change had been made to the branch. I'll undo it, once someone's
> fixed _sre in the branch to be broken again <wink>

I checked in all the changes I thought were necessary.  But as the checkin
comment says,

    This needs fresh testing on all non-Win32 platforms ...
    Running the standard test_re.py is an adequate test.

So start testing, or (my recommendation) upgrade to Win32 <wink>.


From jeremy at alum.mit.edu  Fri Nov 21 23:46:29 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri Nov 21 23:49:17 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
Message-ID: <1069476389.22019.0.camel@localhost.localdomain>

On Fri, 2003-11-21 at 22:58, Tim Peters wrote:
> I checked in all the changes I thought were necessary.  But as the checkin
> comment says,
> 
>     This needs fresh testing on all non-Win32 platforms ...
>     Running the standard test_re.py is an adequate test.
> 
> So start testing, or (my recommendation) upgrade to Win32 <wink>.

Did a cvs update about 30 minutes ago.  make test reports no errors. 
Running again with "-u all -r" to see what happens.

Jeremy


From jeremy at alum.mit.edu  Sat Nov 22 00:10:05 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Sat Nov 22 00:12:52 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069476389.22019.0.camel@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
	<1069476389.22019.0.camel@localhost.localdomain>
Message-ID: <1069477805.22019.2.camel@localhost.localdomain>

On Fri, 2003-11-21 at 23:46, Jeremy Hylton wrote:
> On Fri, 2003-11-21 at 22:58, Tim Peters wrote:
> > I checked in all the changes I thought were necessary.  But as the checkin
> > comment says,
> > 
> >     This needs fresh testing on all non-Win32 platforms ...
> >     Running the standard test_re.py is an adequate test.
> > 
> > So start testing, or (my recommendation) upgrade to Win32 <wink>.
> 
> Did a cvs update about 30 minutes ago.  make test reports no errors. 
> Running again with "-u all -r" to see what happens.

Also looks good.  This was with a RH9 system.

Jeremy


From skip at pobox.com  Sat Nov 22 00:13:15 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Nov 22 00:13:25 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
References: <200311220223.hAM2N8E7007850@localhost.localdomain>
	<LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
Message-ID: <16318.61547.384955.115515@montanaro.dyndns.org>


    Tim> So ... upgrade to Win32 <wink>.

I'll consider that after you've been in charge of software development at
Microsoft for a couple years.

Skip

From skip at pobox.com  Sat Nov 22 01:23:51 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Nov 22 01:23:59 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069476389.22019.0.camel@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
	<1069476389.22019.0.camel@localhost.localdomain>
Message-ID: <16319.247.594634.98507@montanaro.dyndns.org>


    >> This needs fresh testing on all non-Win32 platforms ...
    >> Running the standard test_re.py is an adequate test.
    >> 
    >> So start testing, or (my recommendation) upgrade to Win32 <wink>.

    Jeremy> Did a cvs update about 30 minutes ago.  make test reports no
    Jeremy> errors.  Running again with "-u all -r" to see what happens.

"regrtest.py -u all -r" worked for me on Mac OS X.

Skip

From raymond.hettinger at verizon.net  Sat Nov 22 01:47:12 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Sat Nov 22 01:47:44 2003
Subject: [Python-Dev] copy() and deepcopy()
Message-ID: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer>

I would like to confirm my understanding of copying and its
implications.
 
A shallow copy builds only a new outer shell and leaves the inner
references unchanged.  If the outer object is immutable, then a copy
might as well be the original object. So, in the copy module, the copy
function for tuples should just return the original object (the function
looks like it does more but actually does return itself). And, since a
frozenset is immutable, its copy function should also just return self.
 
The point of a deepcopy is to replace each sub-component (at every
nesting level) that could possibly change.  Since sets can only contain
hashable objects which in turn can only contain hashable objects, I
surmise that a shallowcopy of a set would also suffice as its deepcopy.
 
IOW:
   For frozensets,  shallowcopy == deepcopy == self
   For sets, shallowcopy == deepcopy == set(list(self))  # done with
PyDict_Copy()
 
 
Raymond Hettinger
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031122/3de21c58/attachment.html
From anthony at interlink.com.au  Sat Nov 22 02:39:35 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sat Nov 22 02:40:00 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <2mk75toqve.fsf@starship.python.net> 
Message-ID: <200311220739.hAM7dZ7n016749@localhost.localdomain>


>>> Michael Hudson wrote
> We should give the new autoconf a go, at least.

I would strongly prefer to do this sooner than later, so I was thinking
of doing the upgrade sometime this week. Does anyone have/know any 
reasons to not upgrade to the newer autoconf? It should fix a bunch of
build annoyances (and I can get rid of aclocal.m4)

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From martin at v.loewis.de  Sat Nov 22 06:48:48 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov 22 06:49:28 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <20031121195057.GA24270@burma.localdomain>
References: <1069440231.2383.95.camel@anthem>
	<LNBBLJKPBEHFEDALKOLCOEMCHEAB.tim.one@comcast.net>
	<20031121195057.GA24270@burma.localdomain>
Message-ID: <m365hcwrmn.fsf@mira.informatik.hu-berlin.de>

Gustavo Niemeyer <niemeyer@conectiva.com> writes:

> It looks to be this patch's fault:
[...]
> Patch #813391: Reduce limits for amd64 and sparc64.

Sorry for causing so much confusion, and thanks to Tim for fixing it.

Regards,
Martin

From barry at python.org  Sat Nov 22 07:59:56 2003
From: barry at python.org (Barry Warsaw)
Date: Sat Nov 22 08:00:12 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069477805.22019.2.camel@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
	<1069476389.22019.0.camel@localhost.localdomain>
	<1069477805.22019.2.camel@localhost.localdomain>
Message-ID: <1069505993.2383.172.camel@anthem>

On Sat, 2003-11-22 at 00:10, Jeremy Hylton wrote:

> > Did a cvs update about 30 minutes ago.  make test reports no errors. 
> > Running again with "-u all -r" to see what happens.
> 
> Also looks good.  This was with a RH9 system.

Unfortunately, no so for me:

test_mimetypes
test test_mimetypes failed -- Traceback (most recent call last):
  File "/home/barry/projects/python23/Lib/test/test_mimetypes.py", line 52, in test_guess_all_types
    eq(all, ['.bat', '.c', '.h', '.ksh', '.pl', '.txt'])
  File "/home/barry/projects/python23/Lib/unittest.py", line 302, in failUnlessEqual
    raise self.failureException, \
AssertionError: ['.asc', '.bat', '.c', '.h', '.ksh', '.pl', '.txt'] != ['.bat', '.c', '.h', '.ksh', '.pl', '.txt']

But we've seen these before, right?  Doesn't some test interfere with
globals in a way that screws mimetypes occasionally?

-Barry


From aahz at pythoncraft.com  Sat Nov 22 09:21:19 2003
From: aahz at pythoncraft.com (Aahz)
Date: Sat Nov 22 10:31:20 2003
Subject: [Python-Dev] copy() and deepcopy()
In-Reply-To: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer>
References: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer>
Message-ID: <20031122142119.GA23946@panix.com>

On Sat, Nov 22, 2003, Raymond Hettinger wrote:
>
> The point of a deepcopy is to replace each sub-component (at every
> nesting level) that could possibly change.  Since sets can only contain
> hashable objects which in turn can only contain hashable objects, I
> surmise that a shallowcopy of a set would also suffice as its deepcopy.

Thing is, it *is* possible to have a mutable and hashable object.  The
hashable part needs to be immutable, but not the rest.  Consider dicts
in the generic sense: the key needs to be immutable, but the value need
not, and it certainly can be useful to combine key/value into a single
object.

Now, I'm still not sure that your analysis is wrong, but I wanted to be
very, very clear that hashability is not the same thing as immutability.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote 
programs, then the first woodpecker that came along would destroy civilization.

From akma43 at umkc.edu  Sat Nov 22 13:22:54 2003
From: akma43 at umkc.edu (Avneet Mathur)
Date: Sat Nov 22 13:20:59 2003
Subject: [Python-Dev] Help
Message-ID: <000601c3b125$a6e90060$5502a8c0@zeratec>

Hi group,
 
I have been given a problem and as I am novice in Python, I am asking
for the help of you experts.
 
I am supposed to read in a file, search for in the opened file an
expression like this from a list of similar expressions and print out
Hello world.
 
{(inp:han)
<Hello World!>                                                
}
 
Thus the expression in <> after (inp:han) has to be printed out. Please
help! Is there any way to output this to the browser!
Thanks a lot in advance.
 
Avneet Mathur
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031122/e3ecfcb2/attachment.html
From andymac at bullseye.apana.org.au  Fri Nov 21 16:39:34 2003
From: andymac at bullseye.apana.org.au (Andrew MacIntyre)
Date: Sat Nov 22 13:27:20 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069435342.2383.69.camel@anthem>
References: <20031121162253.GA23299@burma.localdomain>
	<LNBBLJKPBEHFEDALKOLCMEJOHEAB.tim.one@comcast.net>
	<20031121165428.GA27853@burma.localdomain>
	<2m7k1todgq.fsf@starship.python.net> <1069435342.2383.69.camel@anthem>
Message-ID: <20031122081535.W77270@bullseye.apana.org.au>

On Fri, 21 Nov 2003, Barry Warsaw wrote:

> FWIW, I'm having much more problems with 2.3cvs on RH7.3.  test_re.py
> core dumps for me for instance.  I'm doing a fresh build --with-pydebug
> and will try to get more information.

sre in 2.3x is compiler sensitive - the stack frame size becomes critical
in how many sre recursions are supported, and a core dump is certain if
the sre recursion limit is more than the available stack space allows.

Threads support may be mixed in with this, as the size of the stack for
the primary or initial thread is what gets exercised by test_re.

On FreeBSD 4.x the stack size for this thread is fixed at 1MB (pthreads
implementation limitation, not OS limit).  gcc versions < 3.0 don't cause
problems with the default sre recursion limit of 10000, but later
versions do.

So I'd suggest trying a lower sre recursion limit to see whether this
helps.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  (pref) | Snail: PO Box 370
        andymac@pcug.org.au             (alt) |        Belconnen  ACT  2616
Web:    http://www.andymac.org/               |        Australia

From tim.one at comcast.net  Sat Nov 22 14:31:14 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 22 14:32:14 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069505993.2383.172.camel@anthem>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAJHFAB.tim.one@comcast.net>

>>> Did a cvs update about 30 minutes ago.  make test reports no errors.
>>> Running again with "-u all -r" to see what happens.

>> Also looks good.  This was with a RH9 system.

[Barry Warsaw]
> Unfortunately, no so for me:
>
> test_mimetypes
> test test_mimetypes failed -- Traceback (most recent call last):
>   File "/home/barry/projects/python23/Lib/test/test_mimetypes.py",
> line 52, in test_guess_all_types
>     eq(all, ['.bat', '.c', '.h', '.ksh', '.pl', '.txt'])
>   File "/home/barry/projects/python23/Lib/unittest.py", line 302,
> in failUnlessEqual
>     raise self.failureException, \
> AssertionError: ['.asc', '.bat', '.c', '.h', '.ksh', '.pl', '.txt']
> != ['.bat', '.c', '.h', '.ksh', '.pl', '.txt']
>
> But we've seen these before, right?  Doesn't some test interfere with
> globals in a way that screws mimetypes occasionally?

googling on test_guess_all_types nails it:

    http://mail.python.org/pipermail/python-dev/2003-September/038264.html

Jeff Epler reported there, in a reply to you about the same thing in 2.3.1,
that test_urllib2 interferes with test_mimetypes (when run in that order),
and included a patch claimed to fix it.  Of course, since he didn't put the
patch on SF, it just got lost.


From martin at v.loewis.de  Sat Nov 22 16:38:04 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Nov 22 16:38:31 2003
Subject: [Python-Dev] Help
In-Reply-To: <000601c3b125$a6e90060$5502a8c0@zeratec>
References: <000601c3b125$a6e90060$5502a8c0@zeratec>
Message-ID: <m31xs0uls3.fsf@mira.informatik.hu-berlin.de>

"Avneet Mathur" <akma43@umkc.edu> writes:

> I have been given a problem and as I am novice in Python, I am asking
> for the help of you experts.

Dear Avneet Mathur,

Please post your question to python-list@python.org, or any other
Python "users" lists. python-dev is for the development of Python.

Regards,
Martin

From guido at python.org  Sat Nov 22 17:48:22 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 22 17:46:55 2003
Subject: [Python-Dev] copy() and deepcopy()
In-Reply-To: Your message of "Sat, 22 Nov 2003 01:47:12 EST."
	<000c01c3b0c4$772fb640$8fbb958d@oemcomputer> 
References: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer> 
Message-ID: <200311222248.hAMMmMm02546@c-24-5-183-134.client.comcast.net>

> I would like to confirm my understanding of copying and its
> implications.
>  
> A shallow copy builds only a new outer shell and leaves the inner
> references unchanged.  If the outer object is immutable, then a copy
> might as well be the original object. So, in the copy module, the copy
> function for tuples should just return the original object (the function
> looks like it does more but actually does return itself). And, since a
> frozenset is immutable, its copy function should also just return self.

Right.  (I have no idea why _copy_tuple(x) doesn't return x; it feels
like superstition or copy-paste from _copy_list().)

> The point of a deepcopy is to replace each sub-component (at every
> nesting level) that could possibly change.  Since sets can only contain
> hashable objects which in turn can only contain hashable objects, I
> surmise that a shallowcopy of a set would also suffice as its deepcopy.

No.  Look at what _deepcopy_tuple() does.  There could be an object
that implements __hash__ but has some instance variable that could be
mutated but isn't part of the hash.

> IOW:
>    For frozensets,  shallowcopy == deepcopy == self
>    For sets, shallowcopy == deepcopy == set(list(self))  # done with
> PyDict_Copy()

No.

For frozensets, shallow copy should return self; for sets, shallow
copy should return set(self).

In both cases, deepcopy() should do something like _deepcopy_list()
and _deepcopy_tuple(), respectively.  That is, deepcopying a set is
pretty straightforward, but must store self in the memo first, so that
(circular!) references to self are correctly deepcopied.  Deepcopying
a frozenset will be a little harder, because there can still be
circular references!  _deepcopy_tuple() shows how to do it.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From bac at OCF.Berkeley.EDU  Sat Nov 22 17:48:58 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Sat Nov 22 17:49:10 2003
Subject: [Python-Dev] How Python is Developed essay (final rough draft)
Message-ID: <3FBFE7DA.2030601@ocf.berkeley.edu>

OK, since I want to have this thing finished and online (plus I need 
this finished for submitting to PyCon) I am making this the final rough 
draft.  This means this is last call on corrections and changes before 
it hopefully makes its public debut (up to pydotorg and whether anyone 
objects to me putting it up on python.org/dev/ ).

Respond with any comment, corrections, etc.  And the sooner the better 
since I am hoping to get it up some time next week.


----------------------------


Guido, Some Guys, and a Mailing List: How Python is Developed
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
by Brett Cannon (brett at python.org)

Introduction
============
Software does not make itself.  Code does not spontaneously come from 
the ether of the universe.  Python_ is no exception to this rule.  Since 
Python made its public debut back in 1991 many people beyond the BDFL 
(Benevolent Dictator For Life, `Guido van Rossum`_) have helped 
contribute time and energy to making Python what it is today; a 
powerful, simple programming language available to all.

But it has not been a random process of people doing whatever they 
wanted to Python.  Over the years a process to the development of Python 
has emerged by the group that heads Python's growth and maintenance; 
`python-dev`_.  This document is an attempt to write this process down 
in hopes of lowering any barriers possibly preventing people from 
contributing to the development of Python.

.. _Python: http://www.python.org/
.. _Guido van Rossum: http://www.python.org/~guido/
.. _python-dev:http://mail.python.org/mailman/listinfo/python-dev


Tools Used
==========
To help facilitate the development of Python, certain tools are used. 
Beyond the obvious ones such as a text editor and email client, two 
tools are very pervasive in the development process.

SourceForge_ is used by python-dev to keep track of feature requests, 
reported bugs, and contributed patches.  A detailed explanation on how 
to use SourceForge is covered later in `General SourceForge Guidelines`_.

CVS_ is a networked file versioning system that stores all of files that 
make up Python that is currently hosted on SourceForge.  It allows the 
developers to have a single repository for the files along with being 
able to keep track of any and all changes to every file.  The basic 
commands and uses can be found in the `dev FAQ`_ along with a multitude 
of tutorials spread across the web.

.. _SourceForge: http://sourceforge.net/projects/python/
.. _CVS: http://www.cvshome.org/
.. _dev FAQ: http://www.python.org/dev/devfaq.html


Communicating
=============
Python development is not just programming.  It requires a great deal of 
communication between people.  This communication is not just between 
the members of python-dev; communication within the greater Python 
community also helps with development.  Several mailing lists and 
newsgroups are used to help organize all of these discussions.

In terms of Python development, the primary location for communication 
is the `python-dev`_ mailing list.  This is where the members of 
python-dev hash out ideas and iron out issues.  It is an open list; 
anyone can subscribe to the mailing list.  While the discussion can get 
quite technical, it is not all out of the reach for even a novice and 
thus should not discourage anyone from joining the list.  Please 
realize, though, this list is meant for the discussion of the 
development of Python; all other questions should be directed somewhere 
else, such as `python-list`_.  Along with this, a level of etiquette is 
expected to be maintained.  A lack of manners will not be tolerated.

When the greater Python community is involved in a discussion, it always 
ends up on `python-list`_.  This mailing list is a gateway to the 
newsgroup `comp.lang.python`_.  This is also a good place to go when you 
have a question about Python that does not pertain to the actual 
development of the language.

Using CVS_ allows the development team to know who made a change to a 
file and when they made their change.  But unless one wants to 
continuously update their local checkout of the repository, the best way 
to stay on top of changes to the repository is to subscribe to 
`Python-checkins`_.  This list sends out an email for each and every 
change to a file in Python.  This list can generate a large amount of 
traffic since even changing a typo in some text will trigger an email to 
be sent out.  But if you wish to be kept abreast of all changes to 
Python then this is a good way to do so.

The Patches_ mailing list sends out an email for all changes to patch 
items on SourceForge_.  This list, just like Python-checkins, can 
generate a large amount of email traffic.  It is in general useful to 
people who wish to help out with the development of Python by knowing 
about all new submitted patches as well as any new developments on 
preexisting ones.

`Python-bugs-list`_ functions much like the Patches mailing list except 
it is for bug items on SourceForge.  If you find yourself wanting to 
help to close and remove bugs in Python this is the right list to 
subscribe to if you can handle the volume of email.

.. _python-list: http://mail.python.org/mailman/listinfo/python-list
.. _comp.lang.python: http://groups.google.com/groups?q=comp.lang.python
.. _Python-checkins: http://mail.python.org/mailman/listinfo/python-checkins
.. _Patches: http://mail.python.org/mailman/listinfo/patches
.. _Python-bugs-list: 
http://mail.python.org/mailman/listinfo/python-bugs-list


The Actual Development
======================
Developing Python is not all just conversations about neat new language 
features (although those neat conversations do come up and there is a 
process to it).  Developing Python also involves maintaining it by 
eliminating discovered bugs, adding and changing features, and various 
other jobs that are not necessarily glamorous but are just as important 
to the language as anything else.


General SourceForge Guidelines
------------------------------
Since a good amount of Python development involves using SourceForge_, 
it is important to follow some guidelines when handling a tracker item 
(bug, patch, etc.).  Probably one of the most important things you can 
do is make sure to set the various options in a new tracker item 
properly.  The submitter should make sure that the Data Type, Category, 
and Group are all set to reasonable values.  The remaining values 
(Assigned To, Status, and Resolution) should in general be left to 
Python developers to set.  The exception to this rule is when you want 
to retract a patch; then "close" the patch by setting Status to "closed" 
and Resolution to whatever is appropriate.

Make sure you do a cursory check to make sure what ever you are 
submitting was not previously submitted by someone else.  Duplication 
just uses up valuable time.

And **please** do not post feature requests, bug reports, or patches to 
the python-dev mailing list.  If you do you will be instructed to create 
an appropriate SourceForge tracker item.  When in doubt as to whether 
you should bring something to python-dev's attention, you can always ask 
on `comp.lang.python`_; Python developers actively participate there and 
move the conversation over if it is deemed reasonable.


Feature Requests
----------------
`Feature requests`_ are for features that you wish Python had but you 
have no plans on actually implementing by writing a patch.  On occasion 
people do go through the features requests (also called RFEs on 
SourceForge) to see if there is anything there that they think should be 
implemented and actually do the implementation.  But in general do not 
expect something put here to be implemented without some participation 
on your part.

The best way to get something implemented is to campaign for it in the 
greater Python community.  `comp.lang.python`_ is the best place to 
accomplish this.  Post to the newsgroup with your idea and see if you 
can either get support or convince someone to implement it.  It might 
even end up being added to `PEP 42`_ so that the idea does not get lost 
in the noise as time passes.

.. _feature requests: 
http://sourceforge.net/tracker/?group_id=5470&atid=355470
.. _PEP 42: http://www.python.org/peps/pep-0042.html


Bug Reports
-----------
Think you found a bug?  Then submit a `bug report`_ on SourceForge. 
Make sure you clearly specify what version of Python you are using, what 
OS, and under what conditions the bug was triggered.  The more 
information you can give the faster the bug can be fixed since time will 
not be wasted requesting more information from you.

.. _bug report: http://sourceforge.net/tracker/?group_id=5470&atid=105470


Patches
-------
Create a patch_ tracker item on SourceForge for any code you think 
should be applied to the Python CVS tree.  For practically any change to 
Python's functionality the documentation and testing suite will need to 
be changed as well.  Doing this in the first place speeds things up 
considerably.

Please make sure your patch is against the CVS repository.  If you don't 
know how to use it (basics are covered in the `dev FAQ`_), then make 
sure you specify what version of Python you made your patch against.

In terms of coding standards, `PEP 8`_ specifies for Python while `PEP 
7`_ specifies for C.  Always try to maximize your code reuse; it makes 
maintenance much easier.

For C code make sure to limit yourself to ANSI C code as much as 
possible.  If you must use non-ANSI C code then see if what you need is 
checked for by looking in pyconfig.h .  You can also look in 
Include/pyport.h for more helpful C code.  If what you need is still not 
there but it is in general available, then add a check in configure.in 
for it (don't forget to run autoreconf to make the changes to take 
effect).  And if that *still* doesn't fit your needs then code up a 
solution yourself.  The reason for all of this is to limit the 
dependence on external code that might not be available for all OSs that 
Python runs on.

Be aware of intellectual property when handling patches.  Any code with 
no copyright will fall under the copyright of the `Python Software 
Foundation`_.  If you have no qualms with that, wonderful; this is the 
best solution for Python.  But if you feel the need to include a 
copyright then make sure that it is compatible with copyright used on 
Python (i.e., BSD-style).  The best solution, though, is to sign the 
copyright over to the Python Software Foundation.

.. _patch: http://sourceforge.net/tracker/?group_id=5470&atid=305470
.. _dev FAQ: http://www.python.org/dev/devfaq.html
.. _PEP 7: http://www.python.org/peps/pep-0007.html
.. _PEP 8: http://www.python.org/peps/pep-0008.html
.. _Python Software Foundation: http://www.python.org/psf/


Changing the Language
=====================
You understand how to file a patch.  You think you have a great idea on 
how Python should change.  You are ready to write code for your change. 
  Great, but you need to realize that certain things must be done for a 
change to be accepted.  Changes fall into two categories; changes to the 
standard library (referred to as the "stdlib") and changes to the 
language proper.


Changes to the stdlib
---------------------
Changes to the stdlib can consist of adding functionality or changing 
existing functionality.

Adding minor functionality (such as a new function or method) requires 
convincing a member of python-dev that the addition of code caused by 
implementing the feature is worth it.  A big addition such as a module 
tends to require more support than just a single member of python-dev. 
As always, getting community support for your addition is a good idea.

With all additions, make sure to write up documentation for your new 
functionality.  Also make sure that proper tests are added to the 
testing suite.

If you want to add a module, be prepared to be called upon for any bug 
fixes or feature requests for that module.  Getting a module added to 
the stdlib makes you by default its maintainer.  If you can't take that 
level of responsibility and commitment and cannot get someone else to 
take it on for you then your battle will be very difficult; when there 
is not a specific maintainer of code python-dev takes responsibility and 
thus your code must be useful to them or else they will reject the 
module.  There is also the possibility of having to write a PEP_ (read 
about PEPs in `Changing the Language Proper`_).

Changing existing functionality can be difficult to do if it breaks 
backwards-compatibility.  If your code will break existing code, you 
must provide a legitimate reason on why making the code act in a 
non-compatible way is better than the status quo.  This requires 
python-dev as a whole to agree to the change.

Changing the Language Proper
----------------------------
Changing Python the language is taken **very** seriously.  Python is 
often heralded for its simplicity and cleanliness.  Any additions to the 
language must continue this tradition and view.  Thus any changes must 
go through a long process.

First, you must write a PEP_ (Python Enhancement Proposal).  This is 
basically just a document that explains what you want, why you want it, 
what could be bad about the change, and how you plan on implementing the 
change.  It is best to get feedback on PEPs on `comp.lang.python`_ and 
from python-dev.  Once you feel the document is ready you can request a 
PEP number and to have it added to the official list of PEPs in `PEP 0`_.

Once you have a PEP, you must then convince python-dev and the BDFL that 
your change is worth it.  Be expected to be bombarded with questions and 
counter-arguments.  It can drag on for over a month, easy.  If you are 
not up for that level of discussion then do not bother with trying to 
get your change in.  If you manage to convince a majority of python-dev 
and the BDFL (or most of python-dev; that can lead to the BDFL changing 
his mind) then your change can be applied.

As with all new code make sure you also have appropriate documentation 
patches along with tests for the new functionality.

.. _PEP: http://www.python.org/peps/pep-0001.html
.. _PEP 0: http://www.python.org/peps/pep-0000.html


Helping Out
===========
Many people say they wish they could help out with the development of 
Python but feel they are not up to writing code.  There are plenty of 
things one can do, though, that does not require you to write code. 
Regardless of your coding abilities, there is something for everyone to 
help with.

For feature requests, adding a comment about what you think is helpful. 
  State whether or not you would like to see the feature.  You can also 
volunteer to write the code to implement the feature if you feel up to it.

For bugs, stating whether or not you can reproduce the bug yourself can 
be extremely helpful.  If you can write a fix for the bug that is very 
helpful as well; start a patch item and reference it in a comment in the 
bug item.

For patches, apply the patch and run the testing suite.  You can do a 
code review on the patch to make sure that it is good, clean code.  If 
the patch adds a new feature, comment on whether you think it is worth 
adding.  If it changes functionality then comment on whether you think 
it might break code; if it does, say whether you think it is worth the 
cost of breaking existing code.  Help add to the patch if it is missing 
documentation patches or needed regression tests.

A special mention about adding a file to a tracker item.  Only official 
developers and the creator of the tracker item can add a file.  This 
means that if you want to add a file and you are neither of the types of 
people just mentioned you have to do an extra step or two.  One thing 
you can do is post the file you want added somewhere else online and 
reference the URL in a comment.  You can also create a new patch item if 
you feel the change is thorough enough and cross-reference between both 
patches in the comments.  Be wary of this last option, though, since 
some people might be offended since it might come off as if you think 
there code is bad and yours is better.  The best solution of all is to 
work with the original poster if they are receptive to help.  But if 
they do not respond or are not friendly then do go ahead and do one of 
the other two suggestions.

For language changes, make your voice be heard.  Comment about any PEPs 
on `comp.lang.python`_ so that the general opinion of the community can 
be assessed.

If there is nothing specific you find you want to work on but still feel 
like contributing nonetheless, there are several things you can do.  The 
documentation can always use fleshing out.  Adding more tests to the 
testing suite is always useful.  Contribute to discussions on 
python-dev, `comp.lang.python`_, or one of the `SIGs`_ (Special Interest 
Groups).  Just helping out in the community by spreading the word about 
Python or helping someone with a question is helpful.

If you really want to get knee-deep in all of this, join python-dev. 
Once you have been actively participating for a while and are generally 
known on python-dev you can request to have checkin rights on the CVS 
tree.  It is a great way to learn how to work in a large, distributed 
group along with how to write great code.

And if all else fails give money; the `Python Software Foundation`_ is a 
non-profit organization that accepts donations that are tax-deductible 
in the United States.  The funds are used for various thing such as 
lawyers for handling the intellectual property of Python to funding 
PyCon_.  But the PSF could do a lot more if they had the funds.  One 
goal is to have enough money to fund having Guido work on Python for a 
full year full-time; this would bring about Python 3.  Every dollar does 
help, so please contribute if you can.

.. _SIGs: http://www.python.org/sigs/
.. _PyCon: http://www.python.org/pycon/


Conclusion
==========
If you get any message from this document, it should be that *anyone* 
can help with the development of Python.  All help is greatly 
appreciated and keeps the language the wonderful piece of software that 
it is.


From bac at OCF.Berkeley.EDU  Sat Nov 22 17:56:07 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Sat Nov 22 17:56:17 2003
Subject: [Python-Dev] Thesis ideas list
Message-ID: <3FBFE987.2050203@ocf.berkeley.edu>

As requested, here is a annotated list of the ideas that I received for 
my masters thesis.  I tried to do a decent job of referencing where info 
is and summarizing the emails I received.  I also did **very** rough 
commenting on each of them in trying to understand them and to see if I 
thought they would make a good thesis for me.

If you have something to contribute to the list, please do so.  Don't 
bother with spelling and grammatical fixes, though, since this is just 
for my personal use and for anyone interested in the ideas; it will not 
see the light of day on python.org or anything unless someone else 
decides to put the effort into that.

I am planning to go in and talk with my thesis advisor after 
Thanksgiving break (next week) to try to narrow down the list so I can 
have a thesis topic chosen by Jan 1.


----------------------------------


=====
Misc.
=====

Annotations
-----------
from Martin: 
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and
http://mail.python.org/pipermail/python-dev/2003-October/039809.html

Similar to attributes as done in .NET .  Michael's ``func()[]`` syntax 
might pull off what Martin wants.

For an overview of attributes in C#, see 
http://www.ondotnet.com/pub/a/dotnet/excerpt/prog_csharp_ch18/index.html?page=1 
.  They appear to be a way to connect data with code.  Using reflection 
you can find out what attributes are attached to an object.  You can 
control what types of objects an attribute can be bound to along with 
specifying arguments.  It seems like the compiler has some built-in 
attributes that it uses to process the code when available.

Since it not only just attaches info to an object, but it is used by the 
language to modify the code.  It seems like the func()[] syntax along 
with Python's dynamic attribute creation covers this, just without the 
built-in syntax.

Could be interesting to come up with a variance on descriptors for 
assigning to something like __metadict__.  If it is a data descriptor it 
is just attached as info to the ojbect.  If it a non-data descriptor, 
though, it gets the code object passed to it.  The only perk of this is 
a way to have info attached in a more abstracted way than just sticking 
the info into __dict__ and making sure you don't overwrite the value (in 
other words __metadict__ would not come into play for name resolution). 
  Basically just standard place to store metadata.

Martin's suggestion was having an attribute that would automatically 
create an XML-RPC interface for a code object.  That might be doable as 
a metaclass, but that could get complicated and messy.  If you could do 
something like::

   def meth() [xml-rpcbuilder]: pass

and have 'meth' automatically get an ``annotation(meth, 'xml-rpc')`` 
ability that returns a wrapper implementing an XML-RPC interface that 
might be cool.  You could do this now with a function that takes 
something, creates a wrapper, and then stores it on the object so that 
it does not have to be recreated every time.  But that becomes an issue 
of overwriting values in the object.  Having a place for metadata would 
lesson that problem somewhat.

All in all it *might* be a good thing, but with Python dynamicism I 
don't see a use-case for it at this moment.


Work on PyPy
------------
from `Holger 
<http://mail.python.org/pipermail/python-dev/2003-October/039772.html>`__

Vague statement that one could work on PyPy.  OSCON 2003 paper at 
http://codespeak.net/pypy/index.cgi?doc/oscon2003-paper.html .  PyPy's 
EU funding proposal has interesting parts at 
http://codespeak.net/pypy/index.cgi?doc/funding/B1.0 and 
http://codespeak.net/pypy/index.cgi?doc/funding/B6.0 .


Multiple dispatch
-----------------
from `Neil 
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__

Look at Dylan_ and Goo_ for inspiration.  An explanation of how Dylan 
does multiple dispatch can be seen at 
http://www.gwydiondylan.org/gdref/tutorial/multiple-dispatch.html .

Multiple dispatch is basically a mechanism of registering a group of 
methods under one generic method that then calls the registered methods 
based on whether the parameter lists can match the arguments being 
passed.  If there is more than one match then they are ordered in terms 
of how "specific" they are; if a parameter requires a subclass or the 
actual class it is less specific.  Methods can then call a special 
method that will call the next method in the calculated order.

The issue with this in terms of Python is how to handle comparing the 
arguments given to a method when a parameter list is just so damn vague. 
  If you have the parameter lists, ``def A(arg1, arg2, *args)`` and 
``def B(*args)``, which one is more specific?  The other issue is that 
since Python has not parameter type-checking beyond argument counts you 
can't base whether a method is more specific or not on the type or 
arguments.  In order for this to be built into the language one would 
have to add type-checking first.  Otherwise one would need to have all 
of this be external to the language.

It should be doable in terms of Python code now.  Building it into the 
language might be nice, but without type checking I don't know how 
useful it would be.

.. _Dylan: http://www.gwydiondylan.org/drm/drm_1.htm
.. _Goo: http://www.ai.mit.edu/~jrb/goo/manual/goomanual.html


Static analysis of Python C code
--------------------------------
from Neal (private email)

Look at the work done by `Dawson Engler`_.  Could check for missing 
DECREF/INCREFs, null pointer dereferences, threading issues, etc.

Appears the research was developing a system to check that basic rules 
were met for code (returned values were checked, disabled interrupts get 
re-enabled, etc.).

.. _Dawson Engler: http://www.stanford.edu/~engler/

======
Memory
======

Mark-and-sweep GC
-----------------
from `Neil 
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__

Only really worth it in terms of code complexity (does C code become 
easier?  How hard to move existing extension modules over?) and to 
measure performance difference.


Chicken GC
----------
from `Neil 
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__ 
with more ideas from `Samuele 
<http://mail.python.org/pipermail/python-dev/2003-October/039818.html>`__ 
and `Phillip Eby 
<http://mail.python.org/pipermail/python-dev/2003-October/039832.html>`__

Chicken_ has its GC covered in a paper entitled "`Cheney on the 
M.T.A.`_".  Seems to be the one Neil likes the most.

Interestingly, Chicken (which is a Scheme-to-C compiler) does all memory 
allocation on the stack.

.. _Chicken: http://www.call-with-current-continuation.org/chicken.html
.. _Cheney on the M.T.A.: http://citeseer.nj.nec.com/baker94cons.html


Boehm-Demers-Weiser collector
-----------------------------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

The collector can be found at 
http://www.hpl.hp.com/personal/Hans_Boehm/gc/index.html .  It is a 
generic mark-and-sweep collector that has been designed to be portable 
and easy to use.


Analyze memory usage
--------------------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Apparently `some guys`_ claim that a high-performance, general memory 
allocator works better than a bunch of custom allocators (Python has a 
bunch of the latter).

.. _some guys: http://citeseer.nj.nec.com/berger01reconsidering.html


=========
Threading
=========

Provide free threading efficiently
----------------------------------
from `Martin 
<http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__

`In the free threading model, a client app may call any object method 
... from any thread at any time. The object must serialize access to all 
of its methods to whatever extent it requires to keep incoming calls 
from conflicting, providing the maximum performance and flexibility. 
<http://www.microsoft.com/msj/0897/free.aspx>`__.  In other words you 
shouldn't have to do any locking to do a method call.

MP threading
------------
from `Dennis Allison 
<http://mail.python.org/pipermail/python-dev/2003-October/039779.html>`__

Try to eliminate the serialization of Python code execution because of 
the GIL.  Look at research by `Maurice Herlihy`_ and `Kourosh 
Gharachorloo`_.

.. _Maurice Herlihy: http://www.cs.brown.edu/people/mph/home.html
.. _Kourosh Gharachorloo: 
http://research.compaq.com/wrl/people/kourosh/bio.html

=========
Compiling
=========

Python to C
-----------
from `Fernando Perez 
<http://mail.python.org/pipermail/python-dev/2003-October/039780.html>`__

`Pat Miller`_ presented a paper on this for scientific work at SciPy 
2003.  Can look to Squeak_ for inspiration.

.. _Pat Miller: http://www.llnl.gov/CASC/people/pmiller/
.. _Squeak: http://www.squeak.org/


Finish AST branch
-----------------
from `Neil 
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__

No research left, but could lead to macros_.


Macros
------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Once access to an AST is available, macros are doable.  Lisp's macros 
work so well because of quasiquotation_.  In order for this to work in 
Python, though, you need some other way to handle it; either through the 
AST like in Maya_ or the CST as in JSE_.  Something else to look at is 
Polyglot_ (what Jeremy wishes the compiler package had become).

.. _quasiquotation: http://citeseer.nj.nec.com/bawden99quasiquotation.html
.. _Maya: http://citeseer.nj.nec.com/baker02maya.html
.. _JSE: http://citeseer.nj.nec.com/context/1821961/0
.. _Polyglot: http://www.cs.cornell.edu/Projects/polyglot/


Refactoring code editor for AST
-------------------------------
from `Neil 
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__


Integrating XML and SQL into the language
-----------------------------------------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Seems to be to make XML and SQL first-class citizens in Python.  Based 
on the work of `Erik Meijer`_.  Paper at 
http://www.research.microsoft.com/~emeijer/Papers/XML2003/xml2003.html 
with his main research page at http://research.microsoft.com/~emeijer/ .

.. _Erik Meijer: http://blogs.gotdotnet.com/emeijer/


Optional type checking
----------------------
from me, but with support from Guido (private email)

Guido thinks it is "one tough problem".  He suggested looking at the 
`types-sig archive`_ for ideas.  Guido would love to have someone 
sanctioned to tackle this problem.

Might be much easier to do if limited to only parameter lists.  Doing 
that minimal amount would allow for a better multiple dispatch 
implementation.  It would also allow for a rudimentary form of 
polymorphism based on parameter signatures.

.. _types-sig archive: http://www.python.org/pipermail/types-sig/


Type inferencing
----------------
from `Martin 
<http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__

Either run-time or compile-time.  "Overlap with the specializing compilers".


Register-based VM
-----------------
from Neal (private email)

Should get a nice performance improvement.  Look at Skip and Neil's 
rattler VM.  Would be a step towards hooking Python into GCC for 
assembly code generation.


Lower-level bytecode
--------------------
from Neal (private email)

Supposedly Java's bytecode is fairly low-level.  Would make the 
transition to a register-based VM easier.  Also would make compiling to 
machine code or JIT compilation simpler.  An IBM developerWorks article 
on Java bytecode is available at 
http://www-106.ibm.com/developerworks/ibm/library/it-haggar_bytecode/ .

Could look at assembly languages (RISC and CISC) and other VMs for ideas 
on bytecodes.


=========
Execution
=========

Portable floating point
-----------------------
from Martin: 
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and
http://mail.python.org/pipermail/python-dev/2003-October/039809.html

Come up with code on a per-platform basis to make up for problems on 
that platform's FPU implementation.  Compare to how Python just provides 
the CPU's implementation while Java guarantees a specific semantic 
behavior by providing the needed code to make it the same on all platforms.

Martin suggested looking at Java's strictfp mode (which was added after 
Java 1.0).  See 
http://developer.java.sun.com/developer/JDCTechTips/2001/tt0410.html#using 
on its usage.

Save interpreter state to disk
------------------------------
from `Martin 
<http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__

Similar to Smalltalk's images.

Would be nice since would provide a fail-safe mechanism for long-running 
processes.  Could also help with debugging by being able to pass around 
state of a program just before an error occurs.


Deterministic Finalization
--------------------------
from Martin: 
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and
http://mail.python.org/pipermail/python-dev/2003-October/039809.html

Having objects implicitly destroyed at certain points.  Example is 
threaded code (in Python)::

  def bump_counter(self):
    self.mutex.acquire()
    try:
      self.counter = self.counter+1
      more_actions()
    finally:
      self.mutex.release()

In C++, you do::

  void bump_counter(){
    MutexAcquistion acquire(this);
    this->counter+=1;
    more_actions();

which is nice since you don't have to explicitly release the lock.


Optimize global namespace access
--------------------------------
from `Neil 
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__ 
and `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Look at `PEP 267`_ and Jeremy's `Faster Namespace`_ slides from 10th 
Python conference.  Neil pointed out that "If we can disallow 
inter-module shadowing of names the job becomes easier" (e.g., making 
``import Foo; Foo.len = 42`` illegal).

.. _PEP 267: http://www.python.org/peps/pep-0267.html
.. _Faster Namespace: 
http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html


Restricted execution
--------------------
from Andrew Bennett (private email)

See the python-dev archives and Summaries for more painful details.


Tail Recursion
--------------
from Me (my brain)

Have proper tail recursion in Python.  Would require identifying where a 
direct function call is returned (could keep it simple and just do it 
where CALL_FUNCTION and RETURN bytecodes are in a row).  Also have to 
deal with exception catching since that requires the frame to stay alive 
to handle the exception.

But getting it to work well could help with memory and performance. 
Don't know if it has been done for a language that had exception handling.


From python at rcn.com  Sat Nov 22 18:02:38 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 22 18:03:10 2003
Subject: [Python-Dev] copy() and deepcopy()
In-Reply-To: <200311222248.hAMMmMm02546@c-24-5-183-134.client.comcast.net>
Message-ID: <005201c3b14c$b9c608a0$6523a044@oemcomputer>

[Aahz]
> Thing is, it *is* possible to have a mutable and hashable object.  The
> hashable part needs to be immutable, but not the rest.  Consider dicts
in > the generic sense: the key needs to be immutable, but the value
need not, > and it certainly can be useful to combine key/value into a
single object.

> Now, I'm still not sure that your analysis is wrong, but I wanted to
be
> very, very clear that hashability is not the same thing as
immutability.


[Guido]
> For frozensets, shallow copy should return self; for sets, shallow
> copy should return set(self).
> 
> In both cases, deepcopy() should do something like _deepcopy_list()
> and _deepcopy_tuple(), respectively.  That is, deepcopying a set is
> pretty straightforward, but must store self in the memo first, so that
> (circular!) references to self are correctly deepcopied.  Deepcopying
> a frozenset will be a little harder, because there can still be
> circular references!  _deepcopy_tuple() shows how to do it.

Thanks guys.  It's all clear now.

The good news is that nothing special has to be done to implement
deepcopying.  The copy.deepcopy() function is already smart enough to do
the right thing when the type provides a __reduce__() method for
pickling.


Raymond 


From mfb at lotusland.dyndns.org  Sat Nov 22 18:27:24 2003
From: mfb at lotusland.dyndns.org (Matthew F. Barnes)
Date: Sat Nov 22 18:27:30 2003
Subject: [Python-Dev] Extending struct.unpack to produce nested tuples
Message-ID: <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org>

I posted this to c.l.py the other day but didn't get any replies, so I
thought I might see how it fares on python-dev.  It's just an idea I had
earlier this week.  I'll attempt a patch if the response is positive.

---

I was wondering if there would be any interest in extending the
struct.unpack format notation to be able to express groups of data
with parenthesis.

For example:

>>> data = struct.pack('iiii', 1, 2, 3, 4)
>>> struct.unpack('i(ii)i', data)  # Note the parentheses
(1, (2, 3), 4)

Use Case:  I have a program written in C that contains a bunch of
aggregate data structures (arrays of structs, structs containing
arrays, etc.) and I'm transmitting these structures over a socket
connection to a Python program that then unpacks the data using the
struct module.  Problem is that I have to unpack the incoming data as
a flat sequence of data elements, and then repartition the sequence
into nested sequences to better reflect how the data is structured in
the C program.  It would be more convenient to express these groupings
as I'm unpacking the raw data.

I'm sure there are plenty of other use cases for such a feature.

Matthew Barnes

From guido at python.org  Sat Nov 22 18:38:49 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 22 18:37:26 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Fri, 21 Nov 2003 03:45:25 +0100."
	<3FBD7C45.3020607@tismer.com> 
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
	<3FBACC4F.7090404@tismer.com>
	<200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>
	<3FBC3296.1090004@tismer.com>
	<200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> 
	<3FBD7C45.3020607@tismer.com> 
Message-ID: <200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net>

> Guido van Rossum wrote:
> > Summary: Chistian is right after all.  instancemethod_getattro should
> > always prefer bound method attributes over function attributes.

[Christian]
> Guido, I'm very happy with your decision, which is most
> probably a wise decision (without any relation to me).
> 
> The point is, that I didn't know what's right or wrong,
> so basically I was asking for advice on a thing I felt
> unhappy with. So I asked you to re-think if the behavior
> is really what you indented, or if you just stopped, early.
> 
> Thanks a lot!
> 
> That's the summary and all about it, you can skip the rest if you like.

Note to python-dev folks: I will make the change in 2.4.  I won't
backport to 2.3 unless someone can really make a case for it; it
*does* change behavior.

[...]
> > The *intention* was for the 2.2 version to have the same behavior:
> > only im_func, im_self and im_class would be handled by the bound
> > method, other attributes would be handled by the function object.
> 
> Ooh, I begin to understand!
> 
> > This is what the IsData test is attempting to do -- the im_*
> > attributes are represented by data descriptors now.  The __class__
> > attribute is also a data descriptor, so that C().x.__class__ gives us
> > <type 'instancemethod'> rather than <type 'function'>.
> 
> IsData is a test for having a write method, too, so we have
> the side effect here that im_* works like I expect, since
> they happen to be writable?
> Well, I didn't look into 2.3 for this, but in 2.2 I get
> 
>  >>> a().x.__class__=42
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> TypeError: __class__ must be set to new-style class, not 'int' object
> [9511 refs]
>  >>>
> 
> which says for sure that this is a writable property, while
> 
>  >>> a().x.im_class=42
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> TypeError: readonly attribute
> [9511 refs]
>  >>>
> 
> seems to be handled differently.
> 
> I only thought of IsData in terms of accessing the
> getter/setter wrappers.

It's all rather complicated.  IsData only checks for the presence of a
tp_descr_set method in the type struct.  im_* happen to be implemented
by a generic approach for defining data attributes which uses a
descriptor type that has a tp_descr_set method, but its implementation
looks for a "READONLY" flag.  This is intentional -- in fact, having a
tp_descr_set (or __set__) method that raises an error is the right way
to create a read-only data attribute (at least for classes whose
instances have a __dict__).

[...]
> I don't need to pickle classes, this works fine in most cases,
> and behavior can be modified by users.

Right.  When you are pickling classes, you're really pickling code,
not data, and that's usually not what pickling is used for.  (Except
in Zope 3, which can store code in the database and hence must pickle
classes.  But it's a lot of work, as Jeremy can testify. :-)

> > (I wonder if the pickling code shouldn't try to call
> > x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of
> > these problems would have occurred... :-)
> 
> That sounds reasonable. Explicit would have been better than
> implicit (by hoping for the expected bound chain).

Especially since *internally* most new-style classes do this for all
of the built-in operations (operations for which there is a function
pointer slot in the type struct or one of its extensions).  This is
different from old-style classes: a classic *instance* can overload
(nearly) any special method by having an instance attribute,
e.g. __add__; but this is not supported for new-style instances.

> __reduce__ as a class method would allow to explicitly spell
> that I want to reduce the instance x of class C.
> 
> x.__class__.__reduce__(x)
> 
> While, in contrast
> 
> x.__class__.__reduce__(x.thing)
> 
> would spell that I want to reduce the "thing" property of the
> x instance of C.
> 
> While
> 
> x.__class__.__reduce__(C.thing)  # would be the same as
> C.__reduce__(C.thing)
> 
> which would reduce the class method "thing" of C, or the class
> property of C, or whatsoever of class C.

You've lost me here.  How does x.__class__.__reduce__ (i.e.,
C.__reduce__) tell the difference between x and x.thing and C.thing???

> I could envision a small extension to the __reduce__ protocol,
> by providing an optional parameter, which would open these
> new ways, and all pickling questions could be solved, probably.
> This is so, since we can find out whether __reduce__ is a class
> method or not.
> If it is just an instance method (implictly bound), it behaves as
> today.
> If it is a class method, is takes a parameter, and then it can find
> out whether to pickle a class, instance, class property or an instance
> property.
> 
> Well, I hope. The above was said while being in bed with 39� Celsius,
> so don't put my words on the assay-balance.

I sure don't understand it.  If you really want this, please sit down
without a fever and explain it with more examples and a clarification
of what you want to change, and how.

[...]
> Until now, I only had to change traceback.c and iterator.c, since
> these don't export enough of their structures to patch things
> from outside. If at some point somebody might decide that some of
> this support code makes sense for the main distribution, things
> should of couzrse move to where they belong.

Do you realize that (in C code) you can always get at a type object if
you can create an instance of it, and then you can patch the type
object just fine?

[...]
> What I want to do at some time is to change cPickle to use
> a non-recursive implementation. (Ironically, the Python pickle
> engine *is* non-recursive, if it is run under Stackless).
> So, if I would hack at cPickle at all, I would probably do the
> big big change, and that would be too much to get done in
> reasonable time. That's why I decided to stay small and just
> chime a few __reduce__ thingies in, for the time being.
> Maybe this was not the best way, I don't know.

What's the reason for wanting to make cPickle non-recursive?

[...]
> Right. probably, I will get into trouble with pickling
> unbound class methods.
> Maybe I would just ignore this. Bound class methods do
> appear in my Tasklet system and need to get pickled.
> Unbound methods are much easier to avoid and probably
> not worth the effort. (Yes, tomorrow I will be told
> that it *is* :-)

Unbound methods have the same implementation as bound methods -- they
have the same type, but im_self is None (NULL at the C level).  So you
should be able to handle this easily.  (Unbound methods are not quite
the same as bare functions; the latter of course are pickled by
reference, like classes.)

[...]
> That means, for Py 2.2 and 2.3, my current special case for
> __reduce__ is exactly the way to go, since it doesn't change any
> semantics but for __reduce__, and in 2.4 I just drop these
> three lines? Perfect!

Right.  (I'm not quite ready for the 2.4 checkin, watch the checkins
list though.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Nov 22 18:41:03 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 22 18:39:35 2003
Subject: [Python-Dev] Extending struct.unpack to produce nested tuples
In-Reply-To: Your message of "Sat, 22 Nov 2003 17:27:24 CST."
	<33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org> 
References: <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org>
Message-ID: <200311222341.hAMNf4u03532@c-24-5-183-134.client.comcast.net>

> I was wondering if there would be any interest in extending the
> struct.unpack format notation to be able to express groups of data
> with parenthesis.
> 
> For example:
> 
> >>> data = struct.pack('iiii', 1, 2, 3, 4)
> >>> struct.unpack('i(ii)i', data)  # Note the parentheses
> (1, (2, 3), 4)
> 
> Use Case:  I have a program written in C that contains a bunch of
> aggregate data structures (arrays of structs, structs containing
> arrays, etc.) and I'm transmitting these structures over a socket
> connection to a Python program that then unpacks the data using the
> struct module.  Problem is that I have to unpack the incoming data as
> a flat sequence of data elements, and then repartition the sequence
> into nested sequences to better reflect how the data is structured in
> the C program.  It would be more convenient to express these groupings
> as I'm unpacking the raw data.
> 
> I'm sure there are plenty of other use cases for such a feature.

This is a reasonable suggestion.  You should also be able to write
things like '4(ii)' which would be equivalent to '(ii)(ii)(ii)(ii)'.
Please use SourceForge to upload a patch.  Without a patch nobody is
going to be interested though, I suspect, so don't wait for someone
else to implement this.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From tim.one at comcast.net  Sat Nov 22 21:34:29 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 22 21:34:38 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <m365hcwrmn.fsf@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGECDHFAB.tim.one@comcast.net>

[martin@v.loewis.de]
> Sorry for causing so much confusion, and thanks to Tim for fixing it.

It's OK, Martin!  It was a wonderful example of a simple mistake getting
misdiagnosed and so leading to further mistakes, until the whole was much
more confusing than the sum of its parts.  And, as always, the root cause
was trying to cover up Unix bugs with C's preprocessor <wink>.


From jeremy at alum.mit.edu  Sat Nov 22 23:10:09 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Sat Nov 22 23:13:03 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <1069505993.2383.172.camel@anthem>
References: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net>
	<1069476389.22019.0.camel@localhost.localdomain>
	<1069477805.22019.2.camel@localhost.localdomain>
	<1069505993.2383.172.camel@anthem>
Message-ID: <1069560608.22019.8.camel@localhost.localdomain>

On Sat, 2003-11-22 at 07:59, Barry Warsaw wrote:
> On Sat, 2003-11-22 at 00:10, Jeremy Hylton wrote:
> 
> > > Did a cvs update about 30 minutes ago.  make test reports no errors. 
> > > Running again with "-u all -r" to see what happens.
> > 
> > Also looks good.  This was with a RH9 system.
> 
> Unfortunately, no so for me:
> 
> test_mimetypes
> test test_mimetypes failed -- Traceback (most recent call last):
>   File "/home/barry/projects/python23/Lib/test/test_mimetypes.py", line 52, in test_guess_all_types
>     eq(all, ['.bat', '.c', '.h', '.ksh', '.pl', '.txt'])
>   File "/home/barry/projects/python23/Lib/unittest.py", line 302, in failUnlessEqual
>     raise self.failureException, \
> AssertionError: ['.asc', '.bat', '.c', '.h', '.ksh', '.pl', '.txt'] != ['.bat', '.c', '.h', '.ksh', '.pl', '.txt']
> 
> But we've seen these before, right?  Doesn't some test interfere with
> globals in a way that screws mimetypes occasionally?

Yes and yes.  Use of mimetypes causes the module's init() function to be
run on a set of known files.  test_mimetypes calls init() after zapping
the list of knownfiles.  init() does not clear out existing global state
before re-initializing, which is why the test fails if mimetypes has
been used before test_mimetypes.

Jeremy


From tismer at tismer.com  Sun Nov 23 00:33:48 2003
From: tismer at tismer.com (Christian Tismer)
Date: Sun Nov 23 00:33:50 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net>
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
	<3FBACC4F.7090404@tismer.com>
	<200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>
	<3FBC3296.1090004@tismer.com>
	<200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net>
	<3FBD7C45.3020607@tismer.com>
	<200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net>
Message-ID: <3FC046BC.3030500@tismer.com>

Hi Guido,

>>Guido van Rossum wrote:
>>
>>>Summary: Chistian is right after all.  instancemethod_getattro should
>>>always prefer bound method attributes over function attributes.

...

> Note to python-dev folks: I will make the change in 2.4.  I won't
> backport to 2.3 unless someone can really make a case for it; it
> *does* change behavior.

...

>>I only thought of IsData in terms of accessing the
>>getter/setter wrappers.
> 
> It's all rather complicated.  IsData only checks for the presence of a
> tp_descr_set method in the type struct.  im_* happen to be implemented
> by a generic approach for defining data attributes which uses a
> descriptor type that has a tp_descr_set method, but its implementation
> looks for a "READONLY" flag.  This is intentional -- in fact, having a
> tp_descr_set (or __set__) method that raises an error is the right way
> to create a read-only data attribute (at least for classes whose
> instances have a __dict__).

Arghh! This is in fact harder than I was aware of. You *have*
a setter, for its existance, although it won't set, for
the readonly flag.
Without criticism, you are for sure not finally happy with the
solution, which sounds more like a working proof of concept
than a real concept which you are happy to spread on the world.
I'm better off to keep my hands off and not touch it now.

> [...]
> 
>>I don't need to pickle classes, this works fine in most cases,
>>and behavior can be modified by users.
> 
> Right.  When you are pickling classes, you're really pickling code,
> not data, and that's usually not what pickling is used for.  (Except
> in Zope 3, which can store code in the database and hence must pickle
> classes.  But it's a lot of work, as Jeremy can testify. :-)

Heh! :-)
You have not seen me pickling code, while pickling frames?
All kind of frames (since Stackless has many more frame types),
with running code attached, together with iterators, generators,
the whole catastrophe....

>>>(I wonder if the pickling code shouldn't try to call
>>>x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of
>>>these problems would have occurred... :-)
>>
>>That sounds reasonable. Explicit would have been better than
>>implicit (by hoping for the expected bound chain).

Having that said, without understanding what you meant.
See below.

> Especially since *internally* most new-style classes do this for all
> of the built-in operations (operations for which there is a function
> pointer slot in the type struct or one of its extensions).  This is
> different from old-style classes: a classic *instance* can overload
> (nearly) any special method by having an instance attribute,
> e.g. __add__; but this is not supported for new-style instances.
> 
>>__reduce__ as a class method would allow to explicitly spell
>>that I want to reduce the instance x of class C.
>>
>>x.__class__.__reduce__(x)
>>
>>While, in contrast
>>
>>x.__class__.__reduce__(x.thing)

crap. crappedi crap. *I* was lost!

...

> You've lost me here.  How does x.__class__.__reduce__ (i.e.,
> C.__reduce__) tell the difference between x and x.thing and C.thing???

Nonsense.

>>I could envision a small extension to the __reduce__ protocol,

...

Nonsense. With 39? Celsius.

> I sure don't understand it.  If you really want this, please sit down
> without a fever and explain it with more examples and a clarification
> of what you want to change, and how.

Reset()
Revert()

I got an email from Armin Rigo today, which clearly
said what to do, and I did it.
it works perfectly.

I patches pickle.py and cPickle.c to do essentially what Armin said:
"""
So I'm just saying that pickle.py in wrong in just one place:

                 reduce = getattr(obj, "__reduce__", None)
                 if reduce:
                     rv = reduce()

should be:

                 reduce = getattr(type(obj), "__reduce__", None)
                 if reduce:
                     rv = reduce(obj)
"""

An almost trivial change, although I also had to change copy.py,
and overall I was unhappy since this extends my patch set to more
than replacing python2x.dll, but I hope this will become an
official patch and back-patch.

[moo moo about patching almost all from outside, but iterators
and tracebacks]

> Do you realize that (in C code) you can always get at a type object if
> you can create an instance of it, and then you can patch the type
> object just fine?

Sure I know that.
What I hate is if I have to duplicate or change data structure
declarations, if I can't access them, directly.
For tracebacks, I had to add a field (one reason for the non-recursive
wish, below). For iterobject.c, it was clumsy, since I had to
extend the existing! method table, so I had to touch the source
file, anyway.
(Meanwhile, I see a different way to do it, but well, it is written...)

...

> What's the reason for wanting to make cPickle non-recursive?

Several reasons.
For one, the same reason why I started arguing about deeply
recursive destruction code, and implemented the initial
elevator destructor, you remember. (trashcan)

Same reason. When __del__ crashes, cPickle will crash as well.

Now that I *can* pickle tracebacks and very deep recursions,
I don't want them to crash.

Several people asked on the main list, how to pickle deeply
nested structures without crashing pickle. Well, my general
answer was to rewrite pickle in a non-recursive manner.

On the other hand, my implementation for tracebacks and
tasklets (with large chains of frames attached) was different:
In order to avoid cPickle's shortcomings of recursion, I made
the tasklets produce a *list* of all related frames, instead of
having them refer to each other via f_back.
I did the same for tracebacks, by making the leading traceback
object special, to produce a *list* of all other traceback
objects in the chain.

Armin once said, "rewrite the pickle code", which I'd happily do,
but I do think, the above layout changes are not that bad,
anyway. WHile frame chains and traceback chains are looking
somewhat recursive, they aren't really. I think, they are
lists/tuples by nature, and pickling them as that not only makes
the result of __reduce__ more readable and usable, but the pickle
is also a bit shorter than that of a deeply nested structure.

>>Right. probably, I will get into trouble with pickling
>>unbound class methods.

I'm Wrong! It worked, immediately, after I understood how.

> Unbound methods have the same implementation as bound methods -- they
> have the same type, but im_self is None (NULL at the C level).  So you
> should be able to handle this easily.  (Unbound methods are not quite
> the same as bare functions; the latter of course are pickled by
> reference, like classes.)

Yes, here we go: It was a cake walk:

static PyObject *
method_reduce(PyObject * m)
{
     PyObject *tup, *name, *self_or_class;
	name = PyObject_GetAttrString(m, "__name__");
	if (name == NULL) return NULL;
	self_or_class = PyMethod_GET_SELF(m);
	if (self_or_class == NULL)
		self_or_class = PyMethod_GET_CLASS(m);
	if (self_or_class == NULL)
		self_or_class = Py_None;
	tup = Py_BuildValue("(O(OS))", &PyMethod_Type, self_or_class, name);
	Py_DECREF(name);
     return tup;
}

Works perfectly.
The unpickler code later does nothing at all but do the
existing lookup machinery do the work.
here an excerpt:

     if (!PyArg_ParseTuple (args, "OS", &inst, &methname))
		return NULL;

	/* let the lookup machinery do all the work */

	ret = PyObject_GetAttr(inst, methname);

Perfect, whether inst is a class or an instance, it works.

>>That means, for Py 2.2 and 2.3, my current special case for
>>__reduce__ is exactly the way to go, since it doesn't change any
>>semantics but for __reduce__, and in 2.4 I just drop these
>>three lines? Perfect!

Dropped it, dropped it! Yay!

> Right.  (I'm not quite ready for the 2.4 checkin, watch the checkins
> list though.)

Well, after Armin's input, I dropped my special case, and instead
I will submit a patch for 2.2 and 2.3, which uses your proposed
way to use __reduce__ from pickle and copy.
This is completely compatible and does what we want!

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From pf_moore at yahoo.co.uk  Sun Nov 23 07:24:44 2003
From: pf_moore at yahoo.co.uk (Paul Moore)
Date: Sun Nov 23 07:24:39 2003
Subject: [Python-Dev] Re: Thesis ideas list
References: <3FBFE987.2050203@ocf.berkeley.edu>
Message-ID: <y8u7w9v7.fsf@yahoo.co.uk>

"Brett C." <bac@OCF.Berkeley.EDU> writes:

> Deterministic Finalization
> --------------------------

FWIW, the Parrot developers are (or have been) struggling with this
issue. Specifically, how to do deterministic finalization in the
presence of full (non-refcounting) GC. If you're interested in this,
the parrot dev archives may be worth a look...

Paul.
-- 
This signature intentionally left blank


From skip at manatee.mojam.com  Sun Nov 23 08:00:47 2003
From: skip at manatee.mojam.com (Skip Montanaro)
Date: Sun Nov 23 08:01:01 2003
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200311231300.hAND0l7Z005619@manatee.mojam.com>


Bug/Patch Summary
-----------------

574 open / 4359 total bugs (+43)
193 open / 2459 total patches (+11)

New Bugs
--------

inconsistent popen[2-4]() docs (2003-11-16)
	http://python.org/sf/843293
help(obj) should use __doc__ when available (2003-11-16)
	http://python.org/sf/843385
tkFileDialog.Open is broken (2003-11-17)
	http://python.org/sf/843999
"up" instead of "down" in turtle module documentation (2003-11-17)
	http://python.org/sf/844123
urllib2 fails its builtin test (2003-11-18)
	http://python.org/sf/844336
codecs.open().readlines(sizehint) bug (2003-11-18)
	http://python.org/sf/844561
PackageManager: deselect show hidden: indexerror (2003-11-18)
	http://python.org/sf/844676
os.exec* and first 'arg' (2003-11-19)
	http://python.org/sf/845342
imaplib: traceback from _checkquote with empty string (2003-11-19)
	http://python.org/sf/845560
Python crashes when __init__.py is a directory. (2003-11-20)
	http://python.org/sf/845802
os.chmod does not work with a unicode filename (2003-11-20)
	http://python.org/sf/846133
error in python's grammar (2003-11-21)
	http://python.org/sf/846521
"and" operator tests the first argument twice (2003-11-21)
	http://python.org/sf/846564
control-c is being sent to child thread rather than main (2003-11-21)
	http://python.org/sf/846817
email.Parser.Parser doesn't check for valid Content-Type (2003-11-21)
	http://python.org/sf/846938
datetime.datetime initialization needs more strict checking (2003-11-21)
	http://python.org/sf/847019
NotImplemented return value misinterpreted in new classes (2003-11-21)
	http://python.org/sf/847024
textwrap ignoring fix_sentence_endings for single lines (2003-11-22)
	http://python.org/sf/847346

New Patches
-----------

socketmodule.c: fix for platforms w/o IPV6 (i.e.Solaris 5.7) (2003-11-19)
	http://python.org/sf/845306
Check for signals during regular expression matches (2003-11-20)
	http://python.org/sf/846388
fix for bug #812325 (tarfile violates bufsize) (2003-11-21)
	http://python.org/sf/846659

Closed Bugs
-----------

IDE Preferences (2002-09-11)
	http://python.org/sf/607816
Support RFC 2111 in email package (2002-10-21)
	http://python.org/sf/626452
RFC 2112 in email package (2002-11-06)
	http://python.org/sf/634412
elisp: IM-python menu and newline in function defs (2003-03-21)
	http://python.org/sf/707707
Problem With email.MIMEText Package (2003-05-12)
	http://python.org/sf/736407
test zipimport fails (2003-07-03)
	http://python.org/sf/765456
IDE defaults to Mac linefeeds (2003-08-04)
	http://python.org/sf/782686
email bug with message/rfc822 (2003-08-24)
	http://python.org/sf/794458
email.Message param parsing problem II (2003-08-25)
	http://python.org/sf/795081
plat-mac/applesingle.py needs cosmetic changes (2003-09-09)
	http://python.org/sf/803498
_tkinter compilation fails (2003-09-12)
	http://python.org/sf/805200
RedHat 9 blows up at dlclose of pyexpat.so (2003-09-29)
	http://python.org/sf/814726
bug with ill-formed rfc822 attachments (2003-09-30)
	http://python.org/sf/815563
Missing import in email example (2003-10-01)
	http://python.org/sf/816344
exception with Message.get_filename() (2003-10-15)
	http://python.org/sf/824417
bad value of INSTSONAME in Makefile (2003-10-15)
	http://python.org/sf/824565
email/Generator.py: Incorrect header output (2003-10-20)
	http://python.org/sf/826756
httplib hardcodes Accept-Encoding (2003-10-28)
	http://python.org/sf/831747
email generator can give bad output (2003-11-04)
	http://python.org/sf/836293
Bug in type's GC handling causes segfaults (2003-11-10)
	http://python.org/sf/839548
weakref callbacks and gc corrupt memory (2003-11-12)
	http://python.org/sf/840829
Windows mis-installs to network drive (2003-11-14)
	http://python.org/sf/842629

Closed Patches
--------------

755617: better docs for os.chmod (2003-06-16)
	http://python.org/sf/755677
startup file compiler flags (2003-08-24)
	http://python.org/sf/794400
Build changes for AIX (2003-11-05)
	http://python.org/sf/836434
One more patch for --enable-shared (2003-11-13)
	http://python.org/sf/841807
NameError in the example of sets module (2003-11-15)
	http://python.org/sf/842994
doc fixes builtin super and string.replace (2003-11-16)
	http://python.org/sf/843088

From barry at python.org  Sun Nov 23 11:20:18 2003
From: barry at python.org (Barry Warsaw)
Date: Sun Nov 23 11:20:29 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEAJHFAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCKEAJHFAB.tim.one@comcast.net>
Message-ID: <1069604417.28025.9.camel@anthem>

On Sat, 2003-11-22 at 14:31, Tim Peters wrote:

> googling on test_guess_all_types nails it:
> 
>     http://mail.python.org/pipermail/python-dev/2003-September/038264.html
> 
> Jeff Epler reported there, in a reply to you about the same thing in 2.3.1,
> that test_urllib2 interferes with test_mimetypes (when run in that order),
> and included a patch claimed to fix it.  Of course, since he didn't put the
> patch on SF, it just got lost.

Ah yes, thanks for the memory jog.  I applied (essentially) the set
suggestion to the test.

-Barry


From bac at OCF.Berkeley.EDU  Sun Nov 23 17:40:03 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Sun Nov 23 17:40:13 2003
Subject: [Python-Dev] PEP for removal of string module?
Message-ID: <3FC13743.2070209@ocf.berkeley.edu>

As I was writing the Summary, I noticed that the discussion of how to 
handle the removal of the string module got a little complicated thanks 
to how to deal with stuff that is different between str and unicode. 
There was no explicit (i.e., patch) resolution to the whole thing.

Does this warrant a PEP to work out the details?  Now I am not 
explicitly volunteering to write one since I am no Unicode or locale 
expert and that seems to be the sticking point.  But if one is needed 
and no one steps forward I guess I could (will have to wait until after 
generator expressions get implemented, though, since I am already 
committed to working on that).

-Brett


From bac at OCF.Berkeley.EDU  Sun Nov 23 18:48:18 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Sun Nov 23 18:48:30 2003
Subject: [Python-Dev] python-dev Summary for 10-16-2003 through 11-15-2003
	[draft]
Message-ID: <3FC14742.50201@ocf.berkeley.edu>

Thanks to school I didn't get to the latter half of October summary 
until I needed to start worrying about the first summary for November. 
So I just combined them.

I am hoping to send this summary out Wednesday or Thursday so as to not 
worry about it beyond Thanksgiving morning.  So please try to get your 
corrections and comments in by then.  Thanks.


-------------------------------


python-dev Summary for 2003-10-16 through 2003-11-15
++++++++++++++++++++++++++++++++++++++++++++++++++++
This is a summary of traffic on the `python-dev mailing list`_ from 
October 16, 2003 through November 15, 2003.  It is intended to inform 
the wider Python community of on-going developments on the list.  To 
comment on anything mentioned here, just post to `comp.lang.python`_ (or 
email python-list@python.org which is a gateway to the newsgroup) with a 
subject line mentioning what you are discussing. All python-dev members 
are interested in seeing ideas discussed by the community, so don't 
hesitate to take a stance on something.  And if all of this really 
interests you then get involved and join `python-dev`_!

This is the twenty-eighth and twenty-ninth summaries written by Brett 
Cannon (does anyone even read this?).

All summaries are archived at http://www.python.org/dev/summary/ .

Please note that this summary is written using reStructuredText_ which 
can be found at http://docutils.sf.net/rst.html .  Any unfamiliar 
punctuation is probably markup for reST_ (otherwise it is probably 
regular expression syntax or a typo =); you can safely ignore it, 
although I suggest learning reST; it's simple and is accepted for `PEP 
markup`_ and gives some perks for the HTML output.  Also, because of the 
wonders of programs that like to reformat text, I cannot guarantee you 
will be able to run the text version of this summary through Docutils_ 
as-is unless it is from the original text file.

.. _PEP Markup: http://www.python.org/peps/pep-0012.html

The in-development version of the documentation for Python can be found 
at http://www.python.org/dev/doc/devel/ and should be used when looking 
up any documentation on something mentioned here.  PEPs (Python 
Enhancement Proposals) are located at http://www.python.org/peps/ .  To 
view files in the Python CVS online, go to 
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/ .  Reported bugs 
and suggested patches can be found at the SourceForge_ project page.

.. _python-dev: http://www.python.org/dev/
.. _SourceForge: http://sourceforge.net/tracker/?group_id=5470
.. _python-dev mailing list: 
http://mail.python.org/mailman/listinfo/python-dev
.. _comp.lang.python: http://groups.google.com/groups?q=comp.lang.python
.. _Docutils: http://docutils.sf.net/
.. _reST:
.. _reStructuredText: http://docutils.sf.net/rst.html

.. contents::

.. _last summary: 
http://www.python.org/dev/summary/2003-09-01_2003-09-15.html


=====================
Summary Announcements
=====================
Thanks to midterms and projects my time got eaten up by school.  That 
postponed when I could work on the twenty-eighth summary so much that 
the twenty-ninth was need of being written.  So they are combined in 
into one to just get the stuff out the door.

The second half of October had some major discussions happen.  Guido and 
Alex Martelli talking equals pain for me.  =)

There was a large discussion on scoping and accessing specific 
namespaces.  Jeremy Hylton is working on a PEP on the subject so I am 
not going to stress myself over summarizing the topic.

A big discussion on the first half of November was about weakrefs and 
shutdown.  Tim Peters figured out the problem (had to do with weakrefs 
referencing things already gc'ed and thus throwing a fit when trying to 
gc them later or keeping an object alive because of the weakref).  It 
was long and complicated, but the problem was solved.

If you have ever wanted to see linked lists used in Python in a rather 
elegant way, take a look at Guido's implementation of itertools.tee at 
http://mail.python.org/pipermail/python-dev/2003-October/039593.html .

Europython is going to be held from June 7-9, 2004 in Sweden.  See 
http://mail.python.org/pipermail/europython/2003-November/003634.html 
for more details.

PyCon is slowly moving along.  The registration site is being put 
through QA and the paper submission system is being worked on.  The Call 
for Proposals (CFP) is still on-going; details at 
http://www.python.org/pycon/dc2004/cfp.html .  Keep an eye out for when 
we announce when the registration and paper submission systems go live.


=========
Summaries
=========
------------------------------------------
How to help with the development of Python
------------------------------------------
In an attempt to make it easy as possible for people to find out how 
they can help contribute to Python's development, I wrote an essay on 
the topic (mentioned last month, but some revisions have been done).  It 
covers how Python is developed and how **anyone** can contribute to the 
development process.  The latest version can be found at 
http://mail.python.org/pipermail/python-dev/2003-October/039473.html .

Any comments on the essay are appreciated.

Contributing threads:
   - `Draft of an essay on Python development 
<http://mail.python.org/pipermail/python-dev/2003-October/038677.html>`__
   - `2nd draft of "How Py is Developed" essay 
<http://mail.python.org/pipermail/python-dev/2003-October/039473.html>`__


------------------------------------------------
Generator Expressions: list comp's older brother
------------------------------------------------
If you ever wanted to have the power of list comprehensions but without 
the overhead of generating the entire list you have Peter Norvig 
initially and then what seems like the rest of the world for generator 
expressions.

`PEP 289`_ covers all the details, but here is a quick intro.  You can 
think of generator expressions as list comprehensions that return an 
iterator for each item instead a list items.  The syntax is practically 
the same as list comprehensions as well; just substitute parentheses for 
square brackets (most of the time; generator expressions just need 
parentheses around them, so being the only argument to a method takes 
care of the parentheses requirement).

A quick example is::

   (x for x in range(10) if x%2)

returns an iterator that returns the odd numbers from 0 to 10.  This 
make list comprehensions just syntactic sugar for passing a generator 
expression to list() (note how extra parentheses are not needed)::

   list(x for x in range(10) is x%2)

Having list comprehensions defined this way also takes away the dangling 
item variable for the 'for' loop.  Using that dangling variable is now 
discouraged and will be made illegal at some point.

For complete details, read the PEP.

.. _PEP 289: http://www.python.org/peps/pep-0289.html

Contributing threads:
   - `decorate-sort-undecorate 
<http://mail.python.org/pipermail/python-dev/2003-October/038652.html>`__
   - `accumulator display syntax 
<http://mail.python.org/pipermail/python-dev/2003-October/038804.html>`__
   - `listcomps vs. for loops 
<http://mail.python.org/pipermail/python-dev/2003-October/039081.html>`__
   - `PEP 289: Generator Expressions (second draft) 
<http://mail.python.org/pipermail/python-dev/2003-October/039356.html>`__


---------------------
list.sorted() is born
---------------------
After the addition of the 'key' argument to list.sort(), people began to 
clamor for list.sort() to return self.  Guido refused to do give in, so 
a compromise was reached.  'list' now has a class method named 'sorted'. 
  Pass it a list and it will return a *copy* of that list sorted.

Contributing threads:
   - `decorate-sort-undecorate 
<http://mail.python.org/pipermail/python-dev/2003-October/038811.html>`__
   - `inline sort option 
<http://mail.python.org/pipermail/python-dev/2003-October/038825.html>`__
   - `sort() return value 
<http://mail.python.org/pipermail/python-dev/2003-October/038855.html>`__
   - `copysort patch 
<http://mail.python.org/pipermail/python-dev/2003-October/038947.html>`__


------------------------------------
Recursion limit in re is now history
------------------------------------
Thanks to Gustavo Niemeyer the recursion limit in the re module has now 
be removed!

Contributing threads:
   - `SRE recursion 
<http://mail.python.org/pipermail/python-dev/2003-October/038839.html>`__
   - `SRE recursion removed 
<http://mail.python.org/pipermail/python-dev/2003-October/038962.html>`__


-----------------------------------
Copying iterators one day at a time
-----------------------------------
Reiteration for iterators came up as part of the immense discussion on 
generator expressions.  The difficulty of doing it generally came up. 
This lead to Alex Martelli proposing magic method support for __copy__ 
in iterators that have want to allow making a copy of itself.  This was 
written down as `PEP 323`_.

As an interim solution, itertools grew a new function: tee.  It takes in 
an iterable and returns two iterators which independently iterate over 
the iterable.

.. _PEP 323: http://www.python.org/peps/pep-0323.html

Contributing threads:
   - `Reiterability 
<http://mail.python.org/pipermail/python-dev/2003-October/038969.html>`__
   - `cloning iterators again 
<http://mail.python.org/pipermail/python-dev/2003-October/039593.html>`__
   - `... python/nondist/peps pep-0323.txt, NONE ... 
<http://mail.python.org/pipermail/python-dev/2003-October/039656.html>`__
   - `Guido's Magic Code 
<http://mail.python.org/pipermail/python-dev/2003-October/039819.html>`__


------------------------------------------------------
Returning Py_(None, True, False) now easier than ever!
------------------------------------------------------
Py_RETURN_NONE, Py_RETURN_TRUE, and Py_RETURN_FALSE have been added to 
Python 2.4.  They are macros for returning the singleton mentioned in 
the name.  Documentation has yet to be written (my fault).

Contributing threads:
   - `How to spell Py_return_None and friends 
<http://mail.python.org/pipermail/python-dev/2003-October/039000.html>`__
   - `python/dist/src/Include object.h, 2.121, ... 
<http://mail.python.org/pipermail/python-dev/2003-October/039026.html>`__


-------------------------------------------------------------------------
'String substitutions'/'dict interpolation'/'replace %(blah)s with $blah'
-------------------------------------------------------------------------
The idea of introducing string substitutions using '$' came up.  Guido 
said that if this was made a built-in feature it would have to wait 
until Python 3.  He was receptive to moving the functionality to a 
module, though.

Barry Warsaw pasted code into 
http://mail.python.org/pipermail/python-dev/2003-October/039369.html 
that handles string substitutions.

Contributing threads:
   - `Can we please have a better dict interpolation syntax? 
<http://mail.python.org/pipermail/python-dev/2003-October/039324.html>`__


------------------------------------------
"reduce() just doesn't get enough mileage"
------------------------------------------
That quote comes from Guido during the discussion over whether 'product' 
should be added as an accumulator function built-in like 'sum'.  The 
idea was shot down and conversation quickly turned to whether 'reduce' 
should stay in the language (the consensus was "no" since the function 
does not read well and its functionality can easily be done with a 'for' 
loop).

A larger discussion on what built-ins should eventually disappear will 
be covered in the next Summary.

Contributing threads:
   - `product() 
<http://mail.python.org/pipermail/python-dev/2003-October/039326.html>`__


-----------
PyPy update
-----------
The PyPy_ development group sent an update on their happenings to the 
list.  Turns out they are trying to get funding from the European Union. 
  They are also fairly close to getting a working version (albeit with 
some bootstrapping from CPython, but it will still be damn cool what 
they have pulled off even with this caveat).

They also announced a sprint they are holding in Amsterdam from Dec. 
14-21.  More info can be found at 
http://codespeak.net/moin/pypy/moin.cgi/AmsterdamSprint .

.. _PyPy: http://codespeak.net/pypy/

Contributing threads:
   - `PyPy: sprint and news 
<http://mail.python.org/pipermail/python-dev/2003-October/039579.html>`__


----------------------------
Never say Python is finished
----------------------------
I asked python-dev for masters thesis ideas.  I great number of 
possibilities were put out.  If anyone out there is curious to see what 
some people would like to see done for Python in terms of a large 
project check the thread out.

Contributing threads:
   - `Looking for master thesis ideas involving Python 
<http://mail.python.org/pipermail/python-dev/2003-October/039763.html>`__


---------------------------------
Rough draft of Decimal object PEP
---------------------------------
Facundo Batista has posted a rough draft of a PEP for a decimal object 
that is being worked on in the sandbox.  Comment on it on 
`comp.lang.python`_ if this interests you.

Contributing threads:
   - `prePEP: Decimal data type 
<http://mail.python.org/pipermail/python-dev/2003-October/039870.html>`__


----------------------------------------------------------
Relations of basestring and bye-bye operator.isMappingType
----------------------------------------------------------
The idea of introducing relatives of basestring for numbers came from 
Alex Martelli.  That idea was shot down for not being needed once the 
merger of int and long occurs.

The point that operator.isMappingType is kind of broken came up.  Both 
Alex and Raymond Hettinger would not mind seeing it disappear.  No one 
objected.  It is still in CVS at the moment, but I would not count on it 
necessarily sticking around.


Contributed threads:
   - `reflections on basestring -- and other abstract basetypes 
<http://mail.python.org/pipermail/python-dev/2003-November/039885.html>`__
   - `operator.isMappingType 
<http://mail.python.org/pipermail/python-dev/2003-November/040122.html>`__


---------------------------------------------------------
Why one checks into the trunk before a maintenance branch
---------------------------------------------------------
The question of whether checking a change into a maintenance branch 
before applying it to the main trunk was acceptable came up.  The short 
answer is "no".  Basically the trunk gets more testing than the 
maintenance branches and thus the patch should have to prove its 
stability first.  Only then should it go into a maintenance branch.

The same goes for changes to code that will eventually disappear in the 
trunk.  Someone might be planning on removing some code, but if that 
person falls off the face of the earth the code will still be there. 
That means applying the patch to the code that is scheduled to disappear 
is still a good idea.

Contributing threads:
   - `check-in policy, trunk vs maintenance branch 
<http://mail.python.org/pipermail/python-dev/2003-November/039903.html>`__


-----------------------
New reversed() built-in
-----------------------
There was a new built-in named reversed(), and all rejoiced.

Straight from the function's doc string: "reverse iterator over values 
of the sequence".  `PEP 322`_ has the relevant details on this toy.

.. _PEP 322: http://www.python.org/peps/pep-0322.html

Contributing threads:
   - `PEP 322:  Reverse Iteration 
<http://mail.python.org/pipermail/python-dev/2003-November/039924.html>`__


---------------------------
Cleaning the built-in house
---------------------------
Guido asked what built-ins should be considered for deprecation. 
Instantly intern, coerce, and apply came up.  apply already had a 
PendingDeprecationWarning and that will stay for the next release or 
two.  intern and coerce, though, did not have any major champions 
(intern had some fans, but just for the functionality).

Guido did state that none of these built-in will be removed any time 
soon.  If they do get deprecated it does not lead to immediate removal. 
  Python 3, though, takes the gloves off and that can see them just 
completely disappear.

Contributing threads:
   - `Deprecating obsolete builtins 
<http://mail.python.org/pipermail/python-dev/2003-November/039994.html>`__


----------------------------------------
Passing arguments to str.(encode|decode)
----------------------------------------
The idea of allowing keyword arguments be passed to any specified 
encoder/decoder was brought up by Raymond Hettinger.  It seemed like an 
idea that was supported.  The idea of specifying the encoder or decoder 
based on the actual object instead of the current way of specifying a 
string that is passed to the registered codec search functions was 
suggested.

Nothing has been finalized on this idea as of now.

Contributing threads:
   - `Optional arguments for str.encode /.decode 
<http://mail.python.org/pipermail/python-dev/2003-November/040055.html>`__


------------------------------------------------------
Where, oh where, to move the good stuff out of string?
------------------------------------------------------
It looks like ascii_* and possibly maketrans from the string module will 
be tacked on to the str type so that the string module can finally be 
removed from the language.  It has not been pronounced upon, but it 
looks like that is what the BDFL is leaning towards.

Issues of using the methods of str as unbound methods did come up.  As 
it stands you cannot pass a unicode object to str.upper and thus there 
is no one uppercasing function as there is in the string module.  This 
issue brought up the problem of Unicode and locale ties and collation 
(how to sort things).

Contributing threads:
   - `other "magic strings" issues 
<http://mail.python.org/pipermail/python-dev/2003-November/040067.html>`__


-----------------------------------------
Supported versions of Sleepycat for bsddb
-----------------------------------------
The basic answer is 3.2 - 4.2 should work when you compile from source.

Contributing threads:
   - `which sleepycat versions do we support in 2.3.* ? 
<http://mail.python.org/pipermail/python-dev/2003-November/040196.html>`__


-----------------------------
Sets now at blazing C speeds!
-----------------------------
Raymond Hettinger implemented the sets API in C!  The new built-ins are 
set (which replaces sets.Set) and frozenset (which replaces 
sets.ImmutableSet).  The APIs are the same as the sets module sans the 
name change from ImmutableSet to frozenset.

Contributing threads:
   - `set() and frozenset() 
<http://mail.python.org/pipermail/python-dev/2003-November/040253.html>`__


From greg at cosc.canterbury.ac.nz  Sun Nov 23 18:56:41 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Sun Nov 23 18:56:47 2003
Subject: [Python-Dev] Extending struct.unpack to produce nested tuples
In-Reply-To: <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org>
Message-ID: <200311232356.hANNufP24780@oma.cosc.canterbury.ac.nz>

> Use Case:  I have a program written in C that contains a bunch of
> aggregate data structures (arrays of structs, structs containing
> arrays, etc.) and I'm transmitting these structures over a socket
> connection to a Python program that then unpacks the data using the
> struct module.

An alternative would be to teach the C program to write
the data in pickle or marshal format... :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From hunterp at fastmail.fm  Sun Nov 23 19:27:20 2003
From: hunterp at fastmail.fm (Hunter Peress)
Date: Sun Nov 23 19:27:27 2003
Subject: [Python-Dev] quick patch for better debugging
Message-ID: <20031124002720.360534252A@server1.messagingengine.com>

Both IndexError and KeyError dont report which object the retrieval
failed on. Having this data would save lots of typing and annoyance.

Eg:
KeyError: 'jio'  could look like KeyError: "Dictionary(some_name) has no
key 'jio'"

IndexError: list index out of range  could look like:  IndexError:
list(some_name) index(some_value) out of range


If this is ok, ill make a patch!

-----
PS: And for those of you that think even more debugging info is needed,
think no more, because I prodded enough a few months ago such that
textmode cgitb is now in 2.3 tree. 

Try:

import cgitb; cgitb.enable(format='text') 

make error here.
-- 
  Hunter Peress
  hunterp@fastmail.fm

From python at rcn.com  Sun Nov 23 19:27:29 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sun Nov 23 19:28:06 2003
Subject: [Python-Dev] python-dev Summary for 10-16-2003 through
	11-15-2003[draft]
In-Reply-To: <3FC14742.50201@ocf.berkeley.edu>
Message-ID: <005c01c3b221$bf2d7c80$edb02c81@oemcomputer>

> If you ever wanted to have the power of list comprehensions but
without
> the overhead of generating the entire list you have Peter Norvig
> initially and then what seems like the rest of the world for generator
> expressions.

[possibly mangled sentence doesn't make sense]


> After the addition of the 'key' argument to list.sort(), people began
to
> clamor for list.sort() to return self.  Guido refused to do give in,
so
> a compromise was reached.  'list' now has a class method named
'sorted'.
>   Pass it a list and it will return a *copy* of that list sorted.


[Add]
What makes a class method so attractive is that the argument need not be
a list, any iterable will do.  The return value *is* of course a list.

By returning a list instead of None, list.sorted() can be used as an
expression instead of a statement.  This makes it possible to use it as
an argument in a function call or as the iterable in a for-loop::

    # iterate over a dictionary sorted by key
    for key, value in list.sorted(mydict.iteritems()):
   

> As an interim solution, itertools grew a new function: tee.  It takes
in
> an iterable and returns two iterators which independently iterate over
> the iterable.

[replace] two
[with] two or more


> The point that operator.isMappingType is kind of broken came up.  Both
> Alex and Raymond Hettinger would not mind seeing it disappear.  No one
> objected.  It is still in CVS at the moment, but I would not count on
it
> necessarily sticking around.

["It's not quite dead yet" ;-)  Actually, there may be a way to
partially fix-it so that it won't be totally useless].


> There was a new built-in named reversed(), and all rejoiced.

[And much flogging of the person who proposed it]


> Straight from the function's doc string: "reverse iterator over values
> of the sequence".  `PEP 322`_ has the relevant details on this toy.

[Replace] toy
[With] major technological innovation of the first order
[Or just] builtin.


> Sets now at blazing C speeds!

[Looks like a certain parroteer will soon by eating pie!]


Another fine summary.
Thanks for the good work.


Raymond


From greg at cosc.canterbury.ac.nz  Sun Nov 23 20:05:50 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Sun Nov 23 20:06:20 2003
Subject: [Python-Dev] quick patch for better debugging
In-Reply-To: <20031124002720.360534252A@server1.messagingengine.com>
Message-ID: <200311240105.hAO15oD25037@oma.cosc.canterbury.ac.nz>

Hunter Peress <hunterp@fastmail.fm>:

> KeyError: 'jio'  could look like KeyError: "Dictionary(some_name)
                                                         ^^^^^^^^^
> has no key 'jio'"

> IndexError: list(some_name) index(some_value) out of range
                   ^^^^^^^^^
Where do you propose to get these names from? Lists and
dictionaries don't have names...

I agree with the general idea of providing some sort of
identifying information, but in these cases I can't think
what sort of information would be useful short of displaying
the entire repr() of the object, which would be too much for
a backtrace message, I think.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From guido at python.org  Sun Nov 23 20:26:09 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 23 20:24:39 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Sun, 23 Nov 2003 06:33:48 +0100."
	<3FC046BC.3030500@tismer.com> 
References: <3FB99A6E.5070000@tismer.com>
	<200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net>
	<3FBAC6E4.2020202@tismer.com>
	<200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net>
	<3FBACC4F.7090404@tismer.com>
	<200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net>
	<3FBC3296.1090004@tismer.com>
	<200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net>
	<3FBD7C45.3020607@tismer.com>
	<200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net> 
	<3FC046BC.3030500@tismer.com> 
Message-ID: <200311240126.hAO1Q9I01704@c-24-5-183-134.client.comcast.net>

> Arghh! This is in fact harder than I was aware of. You *have*
> a setter, for its existance, although it won't set, for
> the readonly flag.
> Without criticism, you are for sure not finally happy with the
> solution, which sounds more like a working proof of concept
> than a real concept which you are happy to spread on the world.
> I'm better off to keep my hands off and not touch it now.

Actually, I like it fine.  There really are four categories:

0) not a descriptor
1) overridable descriptor (used for methods)
2a) read-only non-overridable descriptor (used for read-only data)
2b) writable non-overridable descriptor (used for writable data)

Case (0) is recognized by not having __get__ at all.  Case (1) has
__get__ but not __set__.  Cases (2a) and (2b) have __get__ and
__set__; case (2a) has a __set__ that raises an exception.  There are
other (older) examples of __setattr__ implementations that always
raise an exception.

> I patches pickle.py and cPickle.c to do essentially what Armin said:
> """
> So I'm just saying that pickle.py in wrong in just one place:
> 
>                  reduce = getattr(obj, "__reduce__", None)
>                  if reduce:
>                      rv = reduce()
> 
> should be:
> 
>                  reduce = getattr(type(obj), "__reduce__", None)
>                  if reduce:
>                      rv = reduce(obj)
> """

Right.  (That's what I was trying to say, too. :-)

> An almost trivial change, although I also had to change copy.py,
> and overall I was unhappy since this extends my patch set to more
> than replacing python2x.dll, but I hope this will become an
> official patch and back-patch.

Give it to me baby. (On SF. :-)

> > What's the reason for wanting to make cPickle non-recursive?
> 
> Several reasons.
> For one, the same reason why I started arguing about deeply
> recursive destruction code, and implemented the initial
> elevator destructor, you remember. (trashcan)

Yeah.  Maybe I should get out of the serious language implementation
business, because I still liked it better before.  It may work, but it
is incredibly ugly, and also had bugs for the longest time (and those
bugs were a lot harder to track down than the bug it was trying to
fix).  With Tim's version I can live with it -- but I just don't like
this kind of complexification of the implementation, even if it works
better.

> Same reason. When __del__ crashes, cPickle will crash as well.

Please don't call it __del__.  __del__ is a user-level finalization
callback with very specific properties and problems.  You were
referring to tp_dealloc, which has different issues.

> Now that I *can* pickle tracebacks and very deep recursions,
> I don't want them to crash.
> 
> Several people asked on the main list, how to pickle deeply
> nested structures without crashing pickle. Well, my general
> answer was to rewrite pickle in a non-recursive manner.

I guess it's my anti-Scheme attitude.  I just think the problem is in
the deeply nested structures.  There usually is a less nested data
structure that doesn't have the problem.  But I'll shut up, because
this rant is not productive. :-(

> On the other hand, my implementation for tracebacks and
> tasklets (with large chains of frames attached) was different:
> In order to avoid cPickle's shortcomings of recursion, I made
> the tasklets produce a *list* of all related frames, instead of
> having them refer to each other via f_back.
> I did the same for tracebacks, by making the leading traceback
> object special, to produce a *list* of all other traceback
> objects in the chain.

Hey, just what I said. :-)

> Armin once said, "rewrite the pickle code", which I'd happily do,
> but I do think, the above layout changes are not that bad,
> anyway. WHile frame chains and traceback chains are looking
> somewhat recursive, they aren't really. I think, they are
> lists/tuples by nature, and pickling them as that not only makes
> the result of __reduce__ more readable and usable, but the pickle
> is also a bit shorter than that of a deeply nested structure.

Well, unclear.  Frame chains make sense as chains because they are
reference-counted individually.

> Well, after Armin's input, I dropped my special case, and instead
> I will submit a patch for 2.2 and 2.3, which uses your proposed
> way to use __reduce__ from pickle and copy.
> This is completely compatible and does what we want!

Wonderful!  Please send me the SF issue, I don't subscribe to SF any
more.  (I've done my checkin in case you wondered.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at cosc.canterbury.ac.nz  Sun Nov 23 20:32:31 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Sun Nov 23 20:32:41 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311240126.hAO1Q9I01704@c-24-5-183-134.client.comcast.net>
Message-ID: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz>

Guido says:

> I guess it's my anti-Scheme attitude.  I just think the problem is in
> the deeply nested structures.  There usually is a less nested data
> structure that doesn't have the problem.

and then he says:

> Well, unclear.  Frame chains make sense as chains because they are
> reference-counted individually.

which surely goes to show that sometimes it *does* make
sense to use a deeply nested structure?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From tismer at tismer.com  Sun Nov 23 21:05:55 2003
From: tismer at tismer.com (Christian Tismer)
Date: Sun Nov 23 21:06:03 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz>
References: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz>
Message-ID: <3FC16783.6070304@tismer.com>

Greg Ewing wrote:

> Guido says:
> 
> 
>>I guess it's my anti-Scheme attitude.  I just think the problem is in
>>the deeply nested structures.  There usually is a less nested data
>>structure that doesn't have the problem.
> 
> 
> and then he says:
> 
> 
>>Well, unclear.  Frame chains make sense as chains because they are
>>reference-counted individually.
> 
> 
> which surely goes to show that sometimes it *does* make
> sense to use a deeply nested structure?

You might interpret him this way.

But I don't think he had my implementation of frame
chain pickling in mind, because he doesn't know it,
and nobody but me probably has a working one.

I'm pickling disjoint frame chains, and in my case, these are
linked in both directions, via f_back, and via f_callee, for
other reasons. There is no reason for nested pickling, just
because of the caller/callee relationship.

I agree there might be useful situations for deeply nested
structures, but not this one. Instead, it would be asking for
problems.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From juo34n6y4tg at hotmail.com  Mon Nov 24 06:11:20 2003
From: juo34n6y4tg at hotmail.com (Darrell Hurst)
Date: Sun Nov 23 21:11:10 2003
Subject: [Python-Dev] Prescription MEDS. Valium,
	Xanax. ANYTHING and EVERYTHING ... h   fxq fxbb
Message-ID: <i8-ax0-43e$un3-1982@ss1d5rf5>

Pharmacy Anxiety Specials:

NEW - VAL1UM & XANAX sale! TOP SELLER'S!

Lowest Prices w/ overnight delivery. No questions.

Specials this week also include:

-Amb1en
-Phenterm1ne
-Alprazolam
-Lev1tra
-Soma
-Plus More

Secure site for all products:
http://www.shoprxtoday.biz/rxlist


To see no more of our specials in the future:
http://www.shoprxtoday.biz/a.html


jdje
e 


hjultoz oy ig faf uebyizo r tiufnruja u hajf
 bg 
wrl
pfasxfndnaz jmuxx
stfuf 
From guido at python.org  Sun Nov 23 22:03:07 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 23 22:01:37 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: Your message of "Mon, 24 Nov 2003 14:32:31 +1300."
	<200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> 
References: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> 
Message-ID: <200311240303.hAO337X01787@c-24-5-183-134.client.comcast.net>

> Guido says:
> 
> > I guess it's my anti-Scheme attitude.  I just think the problem is in
> > the deeply nested structures.  There usually is a less nested data
> > structure that doesn't have the problem.
> 
> and then he says:
> 
> > Well, unclear.  Frame chains make sense as chains because they are
> > reference-counted individually.
> 
> which surely goes to show that sometimes it *does* make
> sense to use a deeply nested structure?

Well, without deeply nested data structures the stack wouldn't be that
deep, would it? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From anthony at interlink.com.au  Sun Nov 23 22:42:53 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Nov 23 22:43:24 2003
Subject: [Python-Dev] Time for 2.3.3? 
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEOFHEAB.tim.one@comcast.net> 
Message-ID: <200311240342.hAO3gtwQ015914@localhost.localdomain>


> I checked in all the changes I thought were necessary.  But as the checkin
> comment says,
> 
>     This needs fresh testing on all non-Win32 platforms ...
>     Running the standard test_re.py is an adequate test.
> 
> So start testing, or (my recommendation) upgrade to Win32 <wink>.

Works with GCC 3.3.2 and GCC 3.2.3 compiled versions of Python on Fedora Core 1.


From edloper at gradient.cis.upenn.edu  Mon Nov 24 00:27:38 2003
From: edloper at gradient.cis.upenn.edu (Edward Loper)
Date: Sun Nov 23 23:26:06 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
Message-ID: <3FC196CA.3000007@gradient.cis.upenn.edu>

I was wondering if there would be any interest for adding a special
case for properties with string-valued fget/fset/fdel:

     - if fget is a string, then the getter returns the value of the
       member variable with the given name.
     - if fset is a string, then the setter sets the value of the
       member variable with the given name.
     - if fdel is a string, then the deleter deletes the member variable
       with the given name.

I.e., the following groups would be functionally equivalant:

     property(fget='_foo')
     property(fget=lambda self: self._foo)
     property(fget=lambda self: getattr(self, '_foo'))

     property(fset='_foo')
     property(fset=lambda self, value: setattr(self, '_foo', value))

     property(fdel='_foo')
     property(fdel=lambda self: delattr(self, '_foo'))

This change has 2 advantages:

     1. It's easier to read.  (In my opinion, anyway; what do other
        people think?)

     2. It's faster: for properties whose fget/fset/fdel are strings,
        we can avoid a function call (since the changes are implemented
        in c).  Preliminary tests indicate that this results in
        approximately a 3x speedup for a tight loop of attribute
        lookups.  (It's unclear how much of a speed increase you'd get
        in actual code, though.)

and one disadvantage (that I can think of):

     - It's one more special case to document/know.

This change shouldn't break any existing code, because there's
currently no reason to use string-valued fget/fset/fdel.

Does this change seem useful to other people?  Do the advantages
outweigh the disadvantage?  Or are there other disadvantage that I
neglected to notice?  If this seems like a useful addition, I'd be
happy to work on making a patch that includes test cases & doc
changes.

-Edward


From guido at python.org  Sun Nov 23 23:34:03 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 23 23:32:32 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
Message-ID: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net>

There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that
promise they will disappear in Python 2.4.  If anyone has time to fix
these, I'd appreciate it.  (It's not just a matter of removing the
FutureWarnings -- you actually have to implement the promised future
behavior. :-) I may get to these myself, but they're not exactly
rocket science, so they might be a good thing for a beginning
developer (use SF please if you'd like someone to review the changes
first).

Another -- much bigger -- TODO is to implement generator expressions
(PEP 289).  Raymond asked for help but I don't think he got any,
unless it was offered through private email.  Anyone interested?

(Of course, I don't want any of this to interfere with the work to get
2.3.3 out in December.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Nov 23 23:52:51 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 23 23:51:20 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: Your message of "Sun, 23 Nov 2003 23:27:38 CST."
	<3FC196CA.3000007@gradient.cis.upenn.edu> 
References: <3FC196CA.3000007@gradient.cis.upenn.edu> 
Message-ID: <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net>

> I was wondering if there would be any interest for adding a special
> case for properties with string-valued fget/fset/fdel:
> 
>      - if fget is a string, then the getter returns the value of the
>        member variable with the given name.
>      - if fset is a string, then the setter sets the value of the
>        member variable with the given name.
>      - if fdel is a string, then the deleter deletes the member variable
>        with the given name.
> 
> I.e., the following groups would be functionally equivalant:
> 
>      property(fget='_foo')
>      property(fget=lambda self: self._foo)
>      property(fget=lambda self: getattr(self, '_foo'))

Why bother with the getattr() example?

>      property(fset='_foo')
>      property(fset=lambda self, value: setattr(self, '_foo', value))

Also of course (and IMO more readable):
       def _set_foo(self, value): self._foo = value
       property(fset=_set_foo)

>      property(fdel='_foo')
>      property(fdel=lambda self: delattr(self, '_foo'))

(And similar here.)

> This change has 2 advantages:
> 
>      1. It's easier to read.  (In my opinion, anyway; what do other
>         people think?)

Only if you're used to the new syntax.  Otherwise it could mean a
costly excursion into the docs.

>      2. It's faster: for properties whose fget/fset/fdel are strings,
>         we can avoid a function call (since the changes are implemented
>         in c).  Preliminary tests indicate that this results in
>         approximately a 3x speedup for a tight loop of attribute
>         lookups.  (It's unclear how much of a speed increase you'd get
>         in actual code, though.)

Which makes me wonder if this argument has much value.

> and one disadvantage (that I can think of):
> 
>      - It's one more special case to document/know.

Right.  It feels like a hack.

> This change shouldn't break any existing code, because there's
> currently no reason to use string-valued fget/fset/fdel.

Correct.

> Does this change seem useful to other people?  Do the advantages
> outweigh the disadvantage?  Or are there other disadvantage that I
> neglected to notice?  If this seems like a useful addition, I'd be
> happy to work on making a patch that includes test cases & doc
> changes.

It feels somewhat un-Pythonic to me: a special case that just happens
to be useful to some folks.  I want to be very careful in adding too
many of those to the language, because it makes it harder to learn and
makes it feel full of surprises for casual users.  (I'm trying hard to
avoid using the word "Perl" here. :-)

I'm curious about the use case that makes you feel the need for speed.
I would expect most properties not to simply redirect to another
attribute, but to add at least *some* checking or other calculation.

I'd be more in favor if you used a separate "renamed" property:

  foo = renamed("_foo")

being a shortcut for

  def _get_foo(self): return self._foo
  def _set_foo(self, value): self._foo = value
  def _del_foo(self): del self._foo
  foo = property(_get_foo, _set_foo, _del_foo)

but I've got a suspicion you want to combine some string argument
(most likely for fget) with some function argument.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at cosc.canterbury.ac.nz  Mon Nov 24 00:28:07 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 24 00:28:14 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net>
Message-ID: <200311240528.hAO5S7V26159@oma.cosc.canterbury.ac.nz>

Guido:

> I'm curious about the use case that makes you feel the need for speed.
> I would expect most properties not to simply redirect to another
> attribute, but to add at least *some* checking or other calculation.

I suspect he's thinking of cases where you only want to wrap special
behaviour around *some* of the accessors, e.g. you want writing to a
property to be mediated by a function, but reading it can just be a
normal attribute access. Currently, you're forced to pay the price of
a function call for both reading and writing in this case.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From edloper at gradient.cis.upenn.edu  Mon Nov 24 04:35:33 2003
From: edloper at gradient.cis.upenn.edu (Edward Loper)
Date: Mon Nov 24 03:34:05 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net>
References: <3FC196CA.3000007@gradient.cis.upenn.edu>
	<200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net>
Message-ID: <3FC1D0E5.5000907@gradient.cis.upenn.edu>

Guido van Rossum wrote:
>>     1. It's easier to read.  (In my opinion, anyway; what do other
>>        people think?)
> 
> Only if you're used to the new syntax.  Otherwise it could mean a
> costly excursion into the docs.
 > [...]
 >>     - It's one more special case to document/know.
 >
 > Right.  It feels like a hack.

To me it seems like the "obvious" behavior for a string fget/fset/fdel, 
but if it's not universally obvious than you're proably right that it's 
a bad idea to add it.

> but I've got a suspicion you want to combine some string argument
> (most likely for fget) with some function argument.

Yes, the idea was that some properties only redirect on read, or only on 
write; and that the syntax could be made "cleaner" for those cases.

 > I'm curious about the use case that makes you feel the need for speed.
 > I would expect most properties not to simply redirect to another
 > attribute, but to add at least *some* checking or other calculation.

The primary motivation was actually to make the code "easier to read"; 
the speed boost was an added bonus.  (Though not a trivial one -- I do 
have a good number of fairly tight loops that access properties.)  The 
use case that inspired the idea is defining read-only properties for 
immutable objects.  But I guess I would be better off going with wrapper 
functions that create the read-only properties for me (like 
<http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/157768>).

Thanks for the feedback!

-Edward


From fincher.8 at osu.edu  Mon Nov 24 06:12:36 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Mon Nov 24 05:14:52 2003
Subject: [Python-Dev] quick patch for better debugging
In-Reply-To: <200311240105.hAO15oD25037@oma.cosc.canterbury.ac.nz>
References: <200311240105.hAO15oD25037@oma.cosc.canterbury.ac.nz>
Message-ID: <200311240612.36215.fincher.8@osu.edu>

On Sunday 23 November 2003 08:05 pm, Greg Ewing wrote:
> I agree with the general idea of providing some sort of
> identifying information, but in these cases I can't think
> what sort of information would be useful short of displaying
> the entire repr() of the object, which would be too much for
> a backtrace message, I think.

You don't have to include the offending index/key in the __str__ of the 
exception itself.  Even if it was just available in the exception's args 
tuple, or even as an attribute on the exception object, it'd still be highly 
useful as a debugging tool.  I've wished for this myself on several 
occasions.

Jeremy

From mwh at python.net  Mon Nov 24 05:53:42 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 24 05:53:49 2003
Subject: [Python-Dev] Time for 2.3.3?
In-Reply-To: <200311220739.hAM7dZ7n016749@localhost.localdomain> (Anthony
	Baxter's message of "Sat, 22 Nov 2003 18:39:35 +1100")
References: <200311220739.hAM7dZ7n016749@localhost.localdomain>
Message-ID: <2mvfpam409.fsf@starship.python.net>

Anthony Baxter <anthony@interlink.com.au> writes:

>>>> Michael Hudson wrote
>> We should give the new autoconf a go, at least.
>
> I would strongly prefer to do this sooner than later, so I was thinking
> of doing the upgrade sometime this week. Does anyone have/know any 
> reasons to not upgrade to the newer autoconf?

Well, there was an almost instananeous brown-paper-bag 2.59 release,
but I don't know of any problems with 2.59.  Not sure I would if there
were, mind.

> It should fix a bunch of build annoyances (and I can get rid of
> aclocal.m4)

That's the motivation :-)

Cheers,
mwh

-- 
  Make this IDLE version 0.8.  (We have to skip 0.7 because that 
  was a CNRI release in a corner of the basement of a government 
  building on a planet circling Aldebaran.)
                             -- Guido Van Rossum, in a checkin comment

From mwh at python.net  Mon Nov 24 07:04:19 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 24 07:04:23 2003
Subject: [Python-Dev] Thesis ideas list
In-Reply-To: <3FBFE987.2050203@ocf.berkeley.edu> (Brett C.'s message of
	"Sat, 22 Nov 2003 14:56:07 -0800")
References: <3FBFE987.2050203@ocf.berkeley.edu>
Message-ID: <2m65ham0qk.fsf@starship.python.net>

"Brett C." <bac@OCF.Berkeley.EDU> writes:

> Restricted execution
> --------------------
> from Andrew Bennett (private email)
>
> See the python-dev archives and Summaries for more painful details.

I think this would be a good choice, actually.  Probably fairly
hard...

> Tail Recursion
> --------------
> from Me (my brain)
>
> Have proper tail recursion in Python.  Would require identifying where
> a direct function call is returned (could keep it simple and just do
> it where CALL_FUNCTION and RETURN bytecodes are in a row).  Also have
> to deal with exception catching since that requires the frame to stay
> alive to handle the exception.
>
> But getting it to work well could help with memory and
> performance. Don't know if it has been done for a language that had
> exception handling.

How is this different from stackless?

Cheers,
mwh

-- 
  QNX... the OS that walks like a duck, quacks like a duck, but is,
  in fact, a platypus. ... the adventures of porting duck software 
  to the platypus were avoidable this time.
                                 -- Chris Klein, alt.sysadmin.recovery

From guido at python.org  Mon Nov 24 10:32:34 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 24 10:32:49 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: Your message of "Mon, 24 Nov 2003 12:04:19 GMT."
	<2m65ham0qk.fsf@starship.python.net> 
References: <3FBFE987.2050203@ocf.berkeley.edu>  
	<2m65ham0qk.fsf@starship.python.net> 
Message-ID: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>

> > Tail Recursion
> > --------------
> > from Me (my brain)
> >
> > Have proper tail recursion in Python.  Would require identifying where
> > a direct function call is returned (could keep it simple and just do
> > it where CALL_FUNCTION and RETURN bytecodes are in a row).  Also have
> > to deal with exception catching since that requires the frame to stay
> > alive to handle the exception.
> >
> > But getting it to work well could help with memory and
> > performance. Don't know if it has been done for a language that had
> > exception handling.
> 
> How is this different from stackless?

AFAIK Stackless only curtails the *C* stack, not the chain of Python
frames on the heap.

But I have a problem with tail recursion.  It's generally requested by
new converts from the Scheme/Lisp or functional programming world, and
it usually means they haven't figured out yet how to write code
without using recursion for everything yet.  IOW I'm doubtful on how
much of a difference it would make for real Python programs (which,
simplifying a bit, tend to use loops instead of recursion).  And also
note that even if an exception is not caught, you'd like to see all
stack frames listed when the traceback is printed or when the debugger
is invoked.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Nov 24 10:37:03 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 24 10:37:13 2003
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/idlelib
	NEWS.txt, 1.25, 1.26 idlever.py, 1.15, 1.16
In-Reply-To: Your message of "Mon, 24 Nov 2003 07:29:25 EST."
	<20031124122925.GA13677@rogue.amk.ca> 
References: <E1AO7Jm-0006FF-00@sc8-pr-cvs1.sourceforge.net>  
	<20031124122925.GA13677@rogue.amk.ca> 
Message-ID: <200311241537.hAOFb3w09106@c-24-5-183-134.client.comcast.net>

> On Sun, Nov 23, 2003 at 07:23:18PM -0800, kbk@users.sourceforge.net wrote:
> > + - IDLE now does not fail to save the file anymore if the Tk buffer is not a
> > +   Unicode string, yet eol_convention is.  Python Bugs 774680, 788378
> 
> The above sentence is unfinished.

Not if you assume that the omitted part is "a Unicode string", which I
think was intended.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From mwh at python.net  Mon Nov 24 10:55:49 2003
From: mwh at python.net (Michael Hudson)
Date: Mon Nov 24 10:55:53 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>
	(Guido van Rossum's message of "Mon, 24 Nov 2003 07:32:34 -0800")
References: <3FBFE987.2050203@ocf.berkeley.edu>
	<2m65ham0qk.fsf@starship.python.net>
	<200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>
Message-ID: <2mr7zxlq0q.fsf@starship.python.net>

Guido van Rossum <guido@python.org> writes:

>> > Tail Recursion
>> > --------------
>> > from Me (my brain)
>> >
>> > Have proper tail recursion in Python.  Would require identifying where
>> > a direct function call is returned (could keep it simple and just do
>> > it where CALL_FUNCTION and RETURN bytecodes are in a row).  Also have
>> > to deal with exception catching since that requires the frame to stay
>> > alive to handle the exception.
>> >
>> > But getting it to work well could help with memory and
>> > performance. Don't know if it has been done for a language that had
>> > exception handling.
>> 
>> How is this different from stackless?
>
> AFAIK Stackless only curtails the *C* stack, not the chain of Python
> frames on the heap.

Oh, I see.  Yes.

> But I have a problem with tail recursion.  It's generally requested by
> new converts from the Scheme/Lisp or functional programming world, and
> it usually means they haven't figured out yet how to write code
> without using recursion for everything yet.  IOW I'm doubtful on how
> much of a difference it would make for real Python programs (which,
> simplifying a bit, tend to use loops instead of recursion).  And also
> note that even if an exception is not caught, you'd like to see all
> stack frames listed when the traceback is printed or when the debugger
> is invoked.

Well, this was why I assumed you didn't really want to do the full-on
tail-call-elimination thing :-)

Cheers,
mwh

-- 
  Need to Know is usually an interesting UK digest of things that
  happened last week or might happen next week. [...] This week,
  nothing happened, and we don't care.
                           -- NTK Now, 2000-12-29, http://www.ntk.net/

From tismer at tismer.com  Mon Nov 24 11:31:44 2003
From: tismer at tismer.com (Christian Tismer)
Date: Mon Nov 24 11:31:54 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>
References: <3FBFE987.2050203@ocf.berkeley.edu>
	<2m65ham0qk.fsf@starship.python.net>
	<200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>
Message-ID: <3FC23270.6090506@tismer.com>

Guido van Rossum wrote:

>>>Tail Recursion
...

> AFAIK Stackless only curtails the *C* stack, not the chain of Python
> frames on the heap.

Yup.

> But I have a problem with tail recursion.  It's generally requested by
> new converts from the Scheme/Lisp or functional programming world, and
> it usually means they haven't figured out yet how to write code
> without using recursion for everything yet.  IOW I'm doubtful on how
> much of a difference it would make for real Python programs (which,
> simplifying a bit, tend to use loops instead of recursion).  And also
> note that even if an exception is not caught, you'd like to see all
> stack frames listed when the traceback is printed or when the debugger
> is invoked.

Same here. I'm not for automatic tail recursion detection.
A very simple approach, also pretty easy to implement
would be a "jump" property, which would be added to a function.
It would simply allow to run a different (or the same) function than
the current one without returning.

def sort3(a, b, c):
   if a>b:
     return sort3.jump(b, a, c)
   if b>c:
     return sort3.jump(a, c, b)
   return a, b, c


-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  mobile +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/


From tdelaney at avaya.com  Mon Nov 24 15:54:15 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Mon Nov 24 15:54:26 2003
Subject: [Python-Dev] Tail recursion
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEEEC2C8@au3010avexu1.global.avaya.com>

> From: Guido van Rossum
> 
> But I have a problem with tail recursion.  It's generally requested by
> new converts from the Scheme/Lisp or functional programming world, and
> it usually means they haven't figured out yet how to write code
> without using recursion for everything yet.  IOW I'm doubtful on how
> much of a difference it would make for real Python programs (which,
> simplifying a bit, tend to use loops instead of recursion).  And also
> note that even if an exception is not caught, you'd like to see all
> stack frames listed when the traceback is printed or when the debugger
> is invoked.

However, that doesn't preclude it from being a thesis subject - in some ways it's actually a bonus as it encourages exploration as it's a direction that is *not* going to be explored by the language designer.

It's possible that we could see some truly unexpected benefits come out of this - or there could be no benefits to Python whatsoever.

However, from a purely academic point of view, I think it would be a quite reasonable thesis. It allows applying a well-explorered field of research to a new arena.

Besides ... sometimes a recursive solution is truly beautiful. Although I think in many (most?) cases a loop on a generator is probably the most appropriate and elegant approach.

Tim Delaney

From martin at v.loewis.de  Mon Nov 24 17:52:41 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Mon Nov 24 18:29:43 2003
Subject: [Python-Dev] PEP for removal of string module?
In-Reply-To: <3FC13743.2070209@ocf.berkeley.edu>
References: <3FC13743.2070209@ocf.berkeley.edu>
Message-ID: <m3oev15qh2.fsf@mira.informatik.hu-berlin.de>

"Brett C." <bac@OCF.Berkeley.EDU> writes:

> As I was writing the Summary, I noticed that the discussion of how to
> handle the removal of the string module got a little complicated
> thanks to how to deal with stuff that is different between str and
> unicode. There was no explicit (i.e., patch) resolution to the whole
> thing.
> 
> Does this warrant a PEP to work out the details? 

IMO, no. I'm personally not convinced that the removal of the string
module is desirable. I doubt a PEP could change this attitude.

Regards,
Martin

From nicodemus at esss.com.br  Mon Nov 24 19:40:00 2003
From: nicodemus at esss.com.br (Nicodemus)
Date: Mon Nov 24 18:40:29 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <3FC1D0E5.5000907@gradient.cis.upenn.edu>
References: <3FC196CA.3000007@gradient.cis.upenn.edu>	<200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net>
	<3FC1D0E5.5000907@gradient.cis.upenn.edu>
Message-ID: <3FC2A4E0.4090707@esss.com.br>

Hi everyone. My first post to the list, even thought I have been reading 
it for a long time now. 8)

Edward Loper wrote:

> Guido van Rossum wrote:
> > I'm curious about the use case that makes you feel the need for speed.
> > I would expect most properties not to simply redirect to another
> > attribute, but to add at least *some* checking or other calculation.
>
> The primary motivation was actually to make the code "easier to read"; 
> the speed boost was an added bonus.  (Though not a trivial one -- I do 
> have a good number of fairly tight loops that access properties.)  The 
> use case that inspired the idea is defining read-only properties for 
> immutable objects.  But I guess I would be better off going with 
> wrapper functions that create the read-only properties for me (like 
> <http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/157768>).


Actually, this would introduce a nice feature: allow to easily subclass 
the functions that are part of the property, without the need to 
re-create the property in the subclass.

class C(object):

    def get_foo(self):
        return 'C.foo'

    c = property('get_foo')


class D(object):

    def get_foo(self):
        return 'D.foo'


In the current behaviour, D would have to recreate the property, which 
can be cumbersome if you're only interested in overwriting one of the 
property's methods (which is the common case in my experience).

But I don't agree with Edward that property should accept strings. I 
think they should just accept functions as of now, but don't store the 
actual function object, just it's name, and delay the name lookup until 
it is actually needed.

What you guys think?

Regards,
Nicodemus.


From greg at cosc.canterbury.ac.nz  Mon Nov 24 19:05:16 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 24 19:05:57 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <3FC2A4E0.4090707@esss.com.br>
Message-ID: <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz>

Nicodemus <nicodemus@esss.com.br>:

> Actually, this would introduce a nice feature: allow to easily subclass 
> the functions that are part of the property, without the need to 
> re-create the property in the subclass.
> 
> class C(object):
> 
>     def get_foo(self):
>         return 'C.foo'
> 
>     c = property('get_foo')

Now *that* would be useful (it's slightly different from the
original proposal, as I understood it).

I wrote a function recently to create properties that work
like that, and I'm finding it very useful. It would be
great to have it as a standard feature, either as a part
of the existing 'property' object, or an alternative one.

> But I don't agree with Edward that property should accept strings. I 
> think they should just accept functions as of now, but don't store the 
> actual function object, just it's name, and delay the name lookup until 
> it is actually needed.

No! If all that's being used is the name, then just pass
the name. Anything else would be pointless and confusing.

Plus it would allow the new behaviour to coexist with the
current one: if it's a function, call it, and if it's a
string, use it as a method name to look up.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From greg at cosc.canterbury.ac.nz  Mon Nov 24 19:12:41 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 24 19:12:48 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <3FC2A4E0.4090707@esss.com.br>
Message-ID: <200311250012.hAP0CfE03877@oma.cosc.canterbury.ac.nz>

I just thought of another small benefit - the property
definition can precede the definitions of the methods
which implement it, e.g.

  class C(object):

    c = property('get_foo')

    def get_foo(self):
        ...

which is a more natural order to write things in.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From guido at python.org  Mon Nov 24 19:20:22 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 24 19:21:36 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: Your message of "Tue, 25 Nov 2003 13:05:16 +1300."
	<200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz> 
References: <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz> 
Message-ID: <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net>

> Nicodemus <nicodemus@esss.com.br>:
> 
> > Actually, this would introduce a nice feature: allow to easily subclass 
> > the functions that are part of the property, without the need to 
> > re-create the property in the subclass.
> > 
> > class C(object):
> > 
> >     def get_foo(self):
> >         return 'C.foo'
> > 
> >     c = property('get_foo')

[Greg Ewing]
> Now *that* would be useful (it's slightly different from the
> original proposal, as I understood it).
> 
> I wrote a function recently to create properties that work
> like that, and I'm finding it very useful. It would be
> great to have it as a standard feature, either as a part
> of the existing 'property' object, or an alternative one.
> 
> > But I don't agree with Edward that property should accept
> > strings. I think they should just accept functions as of now, but
> > don't store the actual function object, just it's name, and delay
> > the name lookup until it is actually needed.
> 
> No! If all that's being used is the name, then just pass
> the name. Anything else would be pointless and confusing.
> 
> Plus it would allow the new behaviour to coexist with the
> current one: if it's a function, call it, and if it's a
> string, use it as a method name to look up.

This alternate possibility is yet another argument against Edward's
proposal. :-)

But I think it can be done without using string literals: a metaclass
could scan a class definition for new methods that override functions
used by properties defined in base classes, and automatically create a
new property.  If you only want this behavior for selected properties,
you can use a different class instead of 'property'.  You could then
also do away with the metaclass, but you'd be back at Nicodemus's
proposal, and that seems to incur too much overhead (we could use
heavy caching, but it would be a bit hairy).

Anyway, all of this can be implemented easily by subclassign property
or by defining your own descriptor class -- there's no magic, just
define __get__ and __set__ (and __delete__ and __doc__, to be
complete).

So maybe somebody should implement this for themselves and find out
how often they really use it.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Nov 24 19:22:57 2003
From: guido at python.org (Guido van Rossum)
Date: Mon Nov 24 19:23:06 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: Your message of "Tue, 25 Nov 2003 13:12:41 +1300."
	<200311250012.hAP0CfE03877@oma.cosc.canterbury.ac.nz> 
References: <200311250012.hAP0CfE03877@oma.cosc.canterbury.ac.nz> 
Message-ID: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net>

> I just thought of another small benefit - the property
> definition can precede the definitions of the methods
> which implement it, e.g.
> 
>   class C(object):
> 
>     c = property('get_foo')
> 
>     def get_foo(self):
>         ...
> 
> which is a more natural order to write things in.

OTOH I hate seeing name references inside string quotes, because it
complicates reference checking by tools like PyChecker (which would
have to be told about the meaning of the arguments to property to
check this kind of forward references).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at cosc.canterbury.ac.nz  Mon Nov 24 19:36:06 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 24 19:36:20 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net>
Message-ID: <200311250036.hAP0a6D03921@oma.cosc.canterbury.ac.nz>

Guido:

> So maybe somebody should implement this for themselves and find out
> how often they really use it.

As I said, I have implemented something very similar to this
and I'm making extensive use of it in my current project,
which is a re-working of my Python GUI library. The world
will get a chance to see it soon, I hope...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From greg at cosc.canterbury.ac.nz  Mon Nov 24 19:38:21 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon Nov 24 19:38:29 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net>
Message-ID: <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>

Guido:

> OTOH I hate seeing name references inside string quotes, because it
> complicates reference checking by tools like PyChecker

Oh, dear... you're going to like some of the other tricks
I'm pulling in PyGUI even less, then...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From nicodemus at esss.com.br  Mon Nov 24 21:39:39 2003
From: nicodemus at esss.com.br (Nicodemus)
Date: Mon Nov 24 20:39:48 2003
Subject: [Python-Dev] string-valued fget/fset/fdel for properties
In-Reply-To: <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net>
References: <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz>
	<200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net>
Message-ID: <3FC2C0EB.2050104@esss.com.br>

Guido van Rossum wrote:

>You could then
>also do away with the metaclass, but you'd be back at Nicodemus's
>proposal, and that seems to incur too much overhead (we could use
>heavy caching, but it would be a bit hairy).
>  
>

I think the overhead is very small, unless I'm overlooking something. 
The only extra overhead that I see is the extra lookup every time the 
property is accessed, which is the same as calling a method. But I agree 
that this difference could be significant for some applications.

>Anyway, all of this can be implemented easily by subclassign property
>or by defining your own descriptor class -- there's no magic, just
>define __get__ and __set__ (and __delete__ and __doc__, to be
>complete).
>
>So maybe somebody should implement this for themselves and find out
>how often they really use it.
>  
>

Actually, I already did it. 8)
The class accepts functions just like property does, but keeps only the 
names of the functions, and uses getattr in __get__ and __set__ to 
access the actual functions (nothing magical, as you pointed it out).
I use it quite often, and the biggest advantage is that when you *do* 
need to overwrite one of the property's methods, you don't have to 
change anything in the base class: you just overwrite the method in the 
derived class and that's it. So as a rule, I always use this property 
instead of the built-in, but that's for other reasons besides easy 
subclassing.


Regards,
Nicodemus.


From eppstein at ics.uci.edu  Mon Nov 24 21:12:41 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Mon Nov 24 21:12:45 2003
Subject: [Python-Dev] Re: string-valued fget/fset/fdel for properties
References: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net>
	<200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>
Message-ID: <eppstein-C1C0AA.18124024112003@sea.gmane.org>

In article <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>,
 Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:

> > OTOH I hate seeing name references inside string quotes, because it
> > complicates reference checking by tools like PyChecker
> 
> Oh, dear... you're going to like some of the other tricks
> I'm pulling in PyGUI even less, then...

Name references inside string quotes are also a standard part of PyObjC 
(used to represent an objective-C "selector" i.e. a method name that has 
not yet been bound to an object type).

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From raymond.hettinger at verizon.net  Tue Nov 25 01:24:18 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Tue Nov 25 01:24:55 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
Message-ID: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>

After re-reading previous posts on the subject, I had an idea.  Let's
isolate these functions in the documentation into a separate section
following the rest of the builtins.
 
The cost of having these builtins is not that they take up a few entries
in the directory listing.  Also, it's no real burden to leave them in
the code base.  The real cost is that when learning the language, after
reading the tutorial, the next step is to make sure you know what all
the builtins do before moving on to study the library offerings. 
 
The problem with buffer() and intern() is not that they are totally
useless.  The problem is that it that it is darned difficult an everyday
user to invent productive use cases.  Here on python-dev, one defender
arose for each and said that they once had a use for them.  So, let's
leave the functionality intact and just move it off the list of things
you need to know.  In both cases, it would have saved me some hours
spent trying to figure out what they were good for - I wish someone had
just said, "you can ignore these two". These functions are just
distractors in a person's mental concept space.
 
There's really nothing wrong with have apply() and coerce() being
supported for old code.  The problem with them is why bother even
knowing that they exist - they just don't figure into modern python
code.  Any time spent learning them now is time that could have been
spent learning about the copy or pickle modules or some such.
 
Moving these functions to a separate section sends a clear message to
trainers and book writers that it is okay to skip these topics. Getting
them out of the critical path for learning python will make the language
even easier to master.
 
Some are highly resistant to deprecation because it makes their lives
more difficult.  However, I think even they would like a list of "things
you just don't need to know anymore".  In other words, you don't have to
wait for Py3.0 for a clean house, just push all the clutter in a corner
and walk around it.
 
'nuff said,
 
 
Raymond Hettinger
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031125/852c8ed0/attachment.html
From guido at python.org  Tue Nov 25 01:45:08 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 25 01:45:35 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: Your message of "Tue, 25 Nov 2003 01:24:18 EST."
	<004d01c3b31c$c1d7afe0$e804a044@oemcomputer> 
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> 
Message-ID: <200311250645.hAP6j8C10327@c-24-5-183-134.client.comcast.net>

> After re-reading previous posts on the subject, I had an idea.  Let's
> isolate these functions in the documentation into a separate section
> following the rest of the builtins.
>  
> The cost of having these builtins is not that they take up a few entries
> in the directory listing.  Also, it's no real burden to leave them in
> the code base.  The real cost is that when learning the language, after
> reading the tutorial, the next step is to make sure you know what all
> the builtins do before moving on to study the library offerings. 
>  
> The problem with buffer() and intern() is not that they are totally
> useless.  The problem is that it that it is darned difficult an everyday
> user to invent productive use cases.  Here on python-dev, one defender
> arose for each and said that they once had a use for them.  So, let's
> leave the functionality intact and just move it off the list of things
> you need to know.  In both cases, it would have saved me some hours
> spent trying to figure out what they were good for - I wish someone had
> just said, "you can ignore these two". These functions are just
> distractors in a person's mental concept space.
>  
> There's really nothing wrong with have apply() and coerce() being
> supported for old code.  The problem with them is why bother even
> knowing that they exist - they just don't figure into modern python
> code.  Any time spent learning them now is time that could have been
> spent learning about the copy or pickle modules or some such.
>  
> Moving these functions to a separate section sends a clear message to
> trainers and book writers that it is okay to skip these topics. Getting
> them out of the critical path for learning python will make the language
> even easier to master.
>  
> Some are highly resistant to deprecation because it makes their lives
> more difficult.  However, I think even they would like a list of "things
> you just don't need to know anymore".  In other words, you don't have to
> wait for Py3.0 for a clean house, just push all the clutter in a corner
> and walk around it.

Sounds like a good idea.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From oussoren at cistron.nl  Tue Nov 25 03:39:10 2003
From: oussoren at cistron.nl (Ronald Oussoren)
Date: Tue Nov 25 03:39:10 2003
Subject: [Python-Dev] Re: string-valued fget/fset/fdel for properties
In-Reply-To: <eppstein-C1C0AA.18124024112003@sea.gmane.org>
References: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net>
	<200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>
	<eppstein-C1C0AA.18124024112003@sea.gmane.org>
Message-ID: <D74360E0-1F22-11D8-A5EB-0003931CFE24@cistron.nl>


On 25 nov 2003, at 3:12, David Eppstein wrote:

> In article <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>,
>  Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
>
>>> OTOH I hate seeing name references inside string quotes, because it
>>> complicates reference checking by tools like PyChecker
>>
>> Oh, dear... you're going to like some of the other tricks
>> I'm pulling in PyGUI even less, then...
>
> Name references inside string quotes are also a standard part of PyObjC
> (used to represent an objective-C "selector" i.e. a method name that 
> has
> not yet been bound to an object type).

That's an implementation detail, and the name references are references 
to *Objective-C* identifiers which are not always valid Python 
identifiers (it's highly unlikely that 'foo:bar:' will ever be a valid 
Python indentifier, while it is a valid Objective-C method name)

Ronald


From pedronis at bluewin.ch  Tue Nov 25 10:32:52 2003
From: pedronis at bluewin.ch (Samuele Pedroni)
Date: Tue Nov 25 10:29:59 2003
Subject: type inference project Re: [Python-Dev] Thesis ideas list
In-Reply-To: <3FBFE987.2050203@ocf.berkeley.edu>
Message-ID: <5.2.1.1.0.20031125162750.027eb6a0@pop.bluewin.ch>


>Type inferencing
>----------------
>
>from `Martin 
><http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__
>
>Either run-time or compile-time.  "Overlap with the specializing compilers".

does somebody know anything about this project, what's happened of it

http://www.ai.mit.edu/projects/dynlangs/Talks/star-killer.htm

http://web.mit.edu/msalib/www/urop/

http://web.mit.edu/msalib/www/urop/presentation-2001-august-10/html-png/

Samuele. 


From tim.one at comcast.net  Tue Nov 25 15:22:51 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Nov 25 15:22:57 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <200311240126.hAO1Q9I01704@c-24-5-183-134.client.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAEHGAB.tim.one@comcast.net>

[Christian]
>> Several people asked on the main list, how to pickle deeply
>> nested structures without crashing pickle. Well, my general
>> answer was to rewrite pickle in a non-recursive manner.

[Guido]
> I guess it's my anti-Scheme attitude.  I just think the problem is in
> the deeply nested structures.  There usually is a less nested data
> structure that doesn't have the problem.  But I'll shut up, because
> this rant is not productive. :-(

Ya, but it *used* to be -- in the early days, many people learned a lot
about writing better programs by avoiding constructs Python penalized
(nested functions, cyclic references, deep recursion, very long reference
chains, massively incestuous multiple inheritance).  Learning to design with
flatter data structures and flatter code was highly educational, and
rewarding, at least for those who played along.

I suppose that's gone for good now.  An irony specific to pickle is that
cPickle coding was driven mostly by Zope's needs, and multi-gigabyte Zope
databases live happily with its recursive design -- most data ends up in
BTrees, and those hardly ever go deeper than 3 levels.  I don't think it's
coincidence that, needing to find a scalable container type with demanding
size and speed constraints, Jim ended up with a "shallow" BTree design.  The
lack of need for deep C recursion was then a consequence of needing to avoid
(for search speed) long paths from root to data.

Oh well.  The next generation will learn the hard way <wink>.

looking-forward-to-death-ly y'rs  - tim


From kalle at lysator.liu.se  Tue Nov 25 15:32:58 2003
From: kalle at lysator.liu.se (Kalle Svensson)
Date: Tue Nov 25 15:33:06 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net>
References: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net>
Message-ID: <20031125203258.GA29814@i92.ryd.student.liu.se>

[Guido van Rossum]
> There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that
> promise they will disappear in Python 2.4.  If anyone has time to
> fix these, I'd appreciate it.  (It's not just a matter of removing
> the FutureWarnings -- you actually have to implement the promised
> future behavior. :-) I may get to these myself, but they're not
> exactly rocket science, so they might be a good thing for a
> beginning developer (use SF please if you'd like someone to review
> the changes first).

I've submitted a patch (http://python.org/sf/849227).  And yes,
somebody should probably take a good look at it before applying.  The
(modified) test suite does pass on my machine, but that's all.  I may
well have forgotten to add tests for new special cases, and I'm not
the most experienced C programmer on the block either.

As a side note, I think that line 233 in Lib/test/test_format.py

  if sys.maxint == 2**32-1:

should be

  if sys.maxint == 2**31-1:

but I didn't include that in the patch or submit a bug report.
Should I?

Peace,
  Kalle
-- 
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.

From guido at python.org  Tue Nov 25 15:50:32 2003
From: guido at python.org (Guido van Rossum)
Date: Tue Nov 25 15:50:39 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: Your message of "Tue, 25 Nov 2003 21:32:58 +0100."
	<20031125203258.GA29814@i92.ryd.student.liu.se> 
References: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net>  
	<20031125203258.GA29814@i92.ryd.student.liu.se> 
Message-ID: <200311252050.hAPKoW912502@c-24-5-183-134.client.comcast.net>

> I've submitted a patch (http://python.org/sf/849227).  And yes,
> somebody should probably take a good look at it before applying.  The
> (modified) test suite does pass on my machine, but that's all.  I may
> well have forgotten to add tests for new special cases, and I'm not
> the most experienced C programmer on the block either.

Thanks!

> As a side note, I think that line 233 in Lib/test/test_format.py
> 
>   if sys.maxint == 2**32-1:
> 
> should be
> 
>   if sys.maxint == 2**31-1:
> 
> but I didn't include that in the patch or submit a bug report.
> Should I?

This definitely smells like a bug (I've never seen a machine with
33-bit ints :-) so feel free to submit a separate patch to SF.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From tdelaney at avaya.com  Tue Nov 25 15:52:12 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Tue Nov 25 15:52:21 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEEEC545@au3010avexu1.global.avaya.com>

> From: python-dev-bounces+tdelaney=avaya.com@python.org 
> 
> After re-reading previous posts on the subject, I had an idea.  Let's 
> isolate these functions in the documentation into a separate section 
> following the rest of the builtins.

Sounds like a good idea to me. I was the person that had a use case for 
intern(), but would be quite happy for it to be in a less prominent position in 
the docs - though more prominent than apply ...

Cheers.

Tim Delaney

From fincher.8 at osu.edu  Tue Nov 25 17:11:22 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Tue Nov 25 16:13:30 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
Message-ID: <200311251711.23260.fincher.8@osu.edu>

On Tuesday 25 November 2003 01:24 am, Raymond Hettinger wrote:
<snip>
> Some are highly resistant to deprecation because it makes their lives
> more difficult.  However, I think even they would like a list of "things
> you just don't need to know anymore".  In other words, you don't have to
> wait for Py3.0 for a clean house, just push all the clutter in a corner
> and walk around it.

I think it's a great idea.

Jeremy

From Jack.Jansen at cwi.nl  Tue Nov 25 16:45:09 2003
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Tue Nov 25 16:46:14 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <m3llqa3jxt.fsf@mira.informatik.hu-berlin.de>
References: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>
	<16316.55271.205085.815371@montanaro.dyndns.org>
	<m3llqa3jxt.fsf@mira.informatik.hu-berlin.de>
Message-ID: <A444F094-1F90-11D8-9B5A-000A27B19B96@cwi.nl>


On 20-nov-03, at 20:43, Martin v. L?wis wrote:

> Skip Montanaro <skip@pobox.com> writes:
>
>> Someone asked on c.l.py about running Python on OS6 (yes, Six) a few 
>> days
>> ago and Python is maintained by interested individuals on other legacy
>> platforms like OS/2 and the Amiga, maybe not at the latest and 
>> greatest
>> release, but they're still there.  There's probably someone on the 
>> planet
>> who'd be willing to putter around with Python on MacOS9.  That person 
>> just
>> hasn't been found yet.
>
> I think they could easily start with Python 2.3, though.

That was my thinking too. I've never tried to compile 2.4 on OS9, and I 
don't
have the intention to do so.
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman


From Jack.Jansen at cwi.nl  Tue Nov 25 16:48:07 2003
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Tue Nov 25 16:48:10 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <200311202232.hAKMWcY08939@oma.cosc.canterbury.ac.nz>
References: <200311202232.hAKMWcY08939@oma.cosc.canterbury.ac.nz>
Message-ID: <0DEA22DF-1F91-11D8-9B5A-000A27B19B96@cwi.nl>


On 20-nov-03, at 23:32, Greg Ewing wrote:

> Jack Jansen <Jack.Jansen@cwi.nl>:
>
>> As you may have noticed if you follow the checkins mailing list I've
>> enthusiastically started ripping out 90% of the work I did on Python
>> the last 10 years
>
> What are you ripping out, exactly? I hope you're not getting rid of
> Carbon too soon, because I'm in the midst of doing a Mac version of my
> Python GUI using it!

Don't worry, Carbon is going to be around for a long time, probably as 
long
as Apple continues to support it (which is probably going to be 
forever).

Some things will change, such as QuickTime and CoreFoundation moving out
of the Carbon package where they didn't really belong in the first 
place, but
for 2.4 I guess we'll have indirection modules in the Carbon package 
that
print a warning and then import the real thing, just as we did when 
moving all
the Mac modules from toplevel modules to being inside the Carbon 
package.

Also, as long as time permits I'll continue to maintain the 2.3.X 
releases
for MacOS9.
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman


From skip at pobox.com  Tue Nov 25 17:56:02 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Nov 25 17:56:15 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <A444F094-1F90-11D8-9B5A-000A27B19B96@cwi.nl>
References: <F9F4BF68-1B4F-11D8-A56C-0030655234CE@cwi.nl>
	<16316.55271.205085.815371@montanaro.dyndns.org>
	<m3llqa3jxt.fsf@mira.informatik.hu-berlin.de>
	<A444F094-1F90-11D8-9B5A-000A27B19B96@cwi.nl>
Message-ID: <16323.56834.218973.642584@montanaro.dyndns.org>


    >> I think they could easily start with Python 2.3, though.

    Jack> That was my thinking too. I've never tried to compile 2.4 on OS9,
    Jack> and I don't have the intention to do so.

Chicken. ;-)

Skip

From greg at cosc.canterbury.ac.nz  Tue Nov 25 18:47:25 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue Nov 25 18:47:34 2003
Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEAEHGAB.tim.one@comcast.net>
Message-ID: <200311252347.hAPNlPH13568@oma.cosc.canterbury.ac.nz>

[Guido]
> I guess it's my anti-Scheme attitude.  I just think the problem is in
> the deeply nested structures.  There usually is a less nested data
> structure that doesn't have the problem.

A couple more thoughts:

There's a difference between nested data structures and recursion. Use
of one doesn't necessarily imply the other.

Also, whether a given data structure is "nested" or not can depend on
your point of view. Most people wouldn't consider a linked list to be
nested -- it may be "wide", but it's not usually thought of as
"deep". I don't think it's unreasonable to ask for a pickle that
doesn't use up a recursion level for each unit of width in such a
structure.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From greg at cosc.canterbury.ac.nz  Tue Nov 25 19:24:14 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue Nov 25 19:24:23 2003
Subject: [Python-Dev] Ripping out Macintosh support
In-Reply-To: <0DEA22DF-1F91-11D8-9B5A-000A27B19B96@cwi.nl>
Message-ID: <200311260024.hAQ0OEk13763@oma.cosc.canterbury.ac.nz>

Jack:

> Don't worry, Carbon is going to be around for a long time, probably as
> long as Apple continues to support it (which is probably going to be
> forever).

That's good to hear, thanks!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From hunterp at fastmail.fm  Wed Nov 26 01:44:16 2003
From: hunterp at fastmail.fm (Hunter Peress)
Date: Wed Nov 26 01:44:21 2003
Subject: [Python-Dev] less quick patch for better debugging.
Message-ID: <20031126064416.DDA0741547@server1.messagingengine.com>

Ah. Theres clearly interest in the idea. I guess its as simple as adding
a field to Py_Object that would record the last namespace name used for a
given object (remember any object could have many names...) (not sure
about Threads btw here).

This would allow for all error lookup-type error messages to be much
cleaner.

The impetus for the above idea being an index error on the following
line:

a[1] +  b[2] + c[3]...currently gives an error message that doesnt say
which variable the list index error occurs in or at which index it occurs
at (helpful if they were all the same object on the same line). Same
issues go for dicts and even any object attributes as well.

PS. maybe in the interest of runtime speed, the assigning to this new
field could only occur when there actually is an error.

From bac at OCF.Berkeley.EDU  Wed Nov 26 01:48:53 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Wed Nov 26 01:49:36 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>
References: <3FBFE987.2050203@ocf.berkeley.edu>
	<2m65ham0qk.fsf@starship.python.net>
	<200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net>
Message-ID: <3FC44CD5.2070009@ocf.berkeley.edu>

Guido van Rossum wrote:
>>>Tail Recursion
>>>--------------
>>>from Me (my brain)
>>>
>>>Have proper tail recursion in Python.  Would require identifying where
>>>a direct function call is returned (could keep it simple and just do
>>>it where CALL_FUNCTION and RETURN bytecodes are in a row).  Also have
>>>to deal with exception catching since that requires the frame to stay
>>>alive to handle the exception.
>>>
>>>But getting it to work well could help with memory and
>>>performance. Don't know if it has been done for a language that had
>>>exception handling.
>>
>>How is this different from stackless?
> 
> 
> AFAIK Stackless only curtails the *C* stack, not the chain of Python
> frames on the heap.
> 
> But I have a problem with tail recursion.  It's generally requested by
> new converts from the Scheme/Lisp or functional programming world, and
> it usually means they haven't figured out yet how to write code
> without using recursion for everything yet.  IOW I'm doubtful on how
> much of a difference it would make for real Python programs (which,
> simplifying a bit, tend to use loops instead of recursion).  And also
> note that even if an exception is not caught, you'd like to see all
> stack frames listed when the traceback is printed or when the debugger
> is invoked.
> 

I mostly agree with everything Guido has said.  It probably should only 
be used when -OO is switched on.  And yes, iterative solutions tend to 
be easier to grasp.

I have to admit that I partially come from a Scheme world (learned it 
*very* shortly after I started the process of learning Python).  So I 
have always had a slight soft spot for elegant recursive solutions.

I will file this idea in the "not that popular" pile.  =)

-Brett


From mwh at python.net  Wed Nov 26 10:59:48 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov 26 10:59:53 2003
Subject: [Python-Dev] IRC Channels
Message-ID: <2m8ym3ruh7.fsf@starship.python.net>

Given that PEP 101 mentions the #python-dev IRC channel by name, I
thought it might be prudent to register it on freenode.  If anyone
wants privileges, email me your nick.

Also, wasn't someone rewriting the release PEPs to talk about roles
instead of names?

Cheers,
mwh

-- 
  : exploding like a turd
  Never had that happen to me, I have to admit.  They do that
  often in your world?              -- Eric The Read & Dave Brown, asr

From mwh at python.net  Wed Nov 26 11:02:45 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov 26 11:02:49 2003
Subject: [Python-Dev] IRC Channels
In-Reply-To: <2m8ym3ruh7.fsf@starship.python.net> (Michael Hudson's message
	of "Wed, 26 Nov 2003 15:59:48 +0000")
References: <2m8ym3ruh7.fsf@starship.python.net>
Message-ID: <2m4qwrruca.fsf@starship.python.net>

Michael Hudson <mwh@python.net> writes:

> Given that PEP 101 mentions the #python-dev IRC channel by name, I
> thought it might be prudent to register it on freenode.  If anyone
> wants privileges, email me your nick.

And I meant to say, I registered #pydotorg and #starship while I was
at it.

Cheers,
mwh

-- 
  Ya, ya, ya, except ... if I were built out of KSR chips, I'd
  be running at 25 or 50 MHz, and would be wrong about ALMOST
  EVERYTHING almost ALL THE TIME just due to being a computer!
                                              -- Tim Peters, 30 Apr 97

From fincher.8 at osu.edu  Wed Nov 26 12:55:54 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Wed Nov 26 11:58:05 2003
Subject: [Python-Dev] IRC Channels
In-Reply-To: <2m4qwrruca.fsf@starship.python.net>
References: <2m8ym3ruh7.fsf@starship.python.net>
	<2m4qwrruca.fsf@starship.python.net>
Message-ID: <200311261255.54797.fincher.8@osu.edu>

On Wednesday 26 November 2003 11:02 am, Michael Hudson wrote:
> Michael Hudson <mwh@python.net> writes:
> > Given that PEP 101 mentions the #python-dev IRC channel by name, I
> > thought it might be prudent to register it on freenode.  If anyone
> > wants privileges, email me your nick.
>
> And I meant to say, I registered #pydotorg and #starship while I was
> at it.

If these channels won't regularly be occupied (i.e., they'll only be used when 
a release is looming) you should probably setup a default topic or a notice 
on join that will notify users of this, so their vacancy doesn't confuse/
dismay users.

Jeremy

From mwh at python.net  Wed Nov 26 12:07:16 2003
From: mwh at python.net (Michael Hudson)
Date: Wed Nov 26 12:07:22 2003
Subject: [Python-Dev] IRC Channels
In-Reply-To: <200311261255.54797.fincher.8@osu.edu> (Jeremy Fincher's
	message of "Wed, 26 Nov 2003 12:55:54 -0500")
References: <2m8ym3ruh7.fsf@starship.python.net>
	<2m4qwrruca.fsf@starship.python.net>
	<200311261255.54797.fincher.8@osu.edu>
Message-ID: <2mznejqcsb.fsf@starship.python.net>

Jeremy Fincher <fincher.8@osu.edu> writes:

> On Wednesday 26 November 2003 11:02 am, Michael Hudson wrote:
>> Michael Hudson <mwh@python.net> writes:
>> > Given that PEP 101 mentions the #python-dev IRC channel by name, I
>> > thought it might be prudent to register it on freenode.  If anyone
>> > wants privileges, email me your nick.
>>
>> And I meant to say, I registered #pydotorg and #starship while I was
>> at it.
>
> If these channels won't regularly be occupied (i.e., they'll only be
> used when a release is looming) you should probably setup a default
> topic or a notice on join that will notify users of this, so their
> vacancy doesn't confuse/ dismay users.

I seem to have now done this :-)  (at least for #python-dev)

Suggestions for better wording would be welcome.

Cheers,
mwh

-- 
  No.  In fact, my eyeballs fell out just from reading this question,
  so it's a good thing I can touch-type.
                                    -- John Baez, sci.physics.research

From fdrake at acm.org  Wed Nov 26 13:40:31 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed Nov 26 13:40:47 2003
Subject: [Python-Dev] Re: [Python-checkins] 
	python/dist/src/Doc/whatsnew whatsnew24.tex, 1.13, 1.14
In-Reply-To: <E1AP3qK-0005iH-00@sc8-pr-cvs1.sourceforge.net>
References: <E1AP3qK-0005iH-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <16324.62367.318560.119195@grendel.zope.com>


rhettinger@users.sourceforge.net writes:
 > Nits from a review of the documentation update.

These too have been quietly pushed to the website.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From raymond.hettinger at verizon.net  Wed Nov 26 15:56:05 2003
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Wed Nov 26 15:56:41 2003
Subject: [Python-Dev] Tutorial:  Brief Introduction to the Standard Libary
Message-ID: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>

I'm adding section to the tutorial with a brief sampling of library
offerings and some short examples of how to use them.

My first draft included:
    copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib

Guido's thoughts:
- copy tends to be overused by beginners
- the shelve module has pitfalls for new users
- cmath is rarely needed and some folks are scared of complex numbers
- urllib2 is be a better choice than urllib

I'm interested to know what your experiences have been with teaching
python.  Which modules are necessary to start doing real work (like
pickle and os), which are most easily grasped (like glob or random),
which have impressive examples only a few lines long (i.e. urllib), and
which might just be fun (turtle would be a candidate if it didn't have a
Tk dependency).

Note, re was included because everyone should know it's there and
everyone should get advice to not use it when string methods will
suffice.

I'm especially interested in thoughts on whether shelve should be
included.  When I first started out, I was very impressed with shelves
because they were the simplest way to add a form of persistence and
because they could be dropped in place of a dictionary in scripts that
were already built.  Also, it was trivially easy to learn based on
existing knowledge of dictionaries.  OTOH, that existing knowledge is
what makes the pitfalls so surprising.

Likewise, I was impressed with the substitutability of line lists, text
splits, file.readlines(), and urlopen().

While I think of copy() and deepcopy() as builtins that got tucked away
in module, Guido is right about their rarity in well-crafted code.

Some other candidates (let's pick just a two or three):

- csv (basic tool for sharing data with other applications)
- datetime (comes up frequently in real apps and admin tasks)
- ftplib (because the examples are so brief)
- getopt or optparse (because the task is common)
- operator (because otherwise, the functionals can be a PITA)
- pprint (because beauty counts)
- struct (because fixed record layouts are common)
- threading/Queue (because without direction people grab thread and
mutexes)
- timeit (because it answers most performance questions in a jiffy)
- unittest (because TDD folks like myself live by it)

I've avoided XML because it is a can of worms and because short examples
don't do it justice.  OTOH, it *is* the hot topic of the day and seems
to be taking over the world one angle bracket at a time.

Ideally, the new section should be relatively short but leave a reader
with a reasonable foundation for crafting non-toy scripts.  A secondary
goal is to show-off the included batteries -- I think it is common for
someone to download several languages and choose between them based on
their tutorial experiences (so, a little flash and sizzle might be
warranted).


Raymond


From fdrake at acm.org  Wed Nov 26 18:49:11 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed Nov 26 18:49:30 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
Message-ID: <16325.15351.885795.87694@grendel.fdrake.net>


Raymond Hettinger writes:
 > I'm adding section to the tutorial with a brief sampling of library
 > offerings and some short examples of how to use them.

Cool!

 > I've avoided XML because it is a can of worms and because short examples
 > don't do it justice.  OTOH, it *is* the hot topic of the day and seems
 > to be taking over the world one angle bracket at a time.

Actually, they usually travel in pairs.  ;-)

I would stay away from XML for this; there's too much there and how to
pick one thing over another isn't always obvious even when someone
explains it.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From greg at cosc.canterbury.ac.nz  Wed Nov 26 19:26:08 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed Nov 26 19:26:20 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <20031126064416.DDA0741547@server1.messagingengine.com>
Message-ID: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>

Hunter Peress <hunterp@fastmail.fm>:

> a[1] +  b[2] + c[3]...currently gives an error message that doesnt say
> which variable the list index error occurs in or at which index it occurs
> at

This would be considerably improved if the error message could
just point out the position in the line instead of just the line
number.

Especially when a statement spans more than one line -- currently
you can't even tell which line of a multi-line statement was the
culprit!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From aleax at aleax.it  Thu Nov 27 05:13:05 2003
From: aleax at aleax.it (Alex Martelli)
Date: Thu Nov 27 05:13:10 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
Message-ID: <200311271113.05578.aleax@aleax.it>

On Wednesday 26 November 2003 09:56 pm, Raymond Hettinger wrote:

> I'm adding section to the tutorial with a brief sampling of library
> offerings and some short examples of how to use them.

Great idea.

>    copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib
   ...
> I'm interested to know what your experiences have been with teaching
> python.  Which modules are necessary to start doing real work (like

I would add:

sys -- "real programs" want to access their command-line arguments
(sys.argv), want to terminate (sys.exit), want to write to sys.stderr.

fileinput -- users are VERY likely to want to "rewrite textfiles in-place"
(as well as wanting to read a bunch of textfiles) and fileinput is just
the ticket for that.  Users coming from perl particularly need fileinput
desperately as it affords close translation of the "while(<>)" idiom.

cStringIO -- I've noticed most newbies find it more natural to "write to
a cStringIO.StringIO pseudofile as they go" then getvalue, rather than
append'ing to a list of strings then ''.join .

time, datetime, calendar -- many real programs want to deal with
dates and times

array -- many newbies try to use lists to do things that are perfect
for array.array's

> pickle and os), which are most easily grasped (like glob or random),
> which have impressive examples only a few lines long (i.e. urllib), and

I think zipfile and gzip are easily grasped AND impressive for people
who've ever needed to read/write compressed files in other languages.
xmlrpclib and SimpleXMLRPCServer are also eye-poppers (and despite
their names you don't need to get into XML at all to show them off:-).
CGIHTTPServer, while of course not all that suitable for "real programs",
has also contributed more than its share in making instant converts to
Python, in my experience -- "instant gratification".


> I'm especially interested in thoughts on whether shelve should be
> included.  When I first started out, I was very impressed with shelves
> because they were the simplest way to add a form of persistence and
> because they could be dropped in place of a dictionary in scripts that
> were already built.  Also, it was trivially easy to learn based on
> existing knowledge of dictionaries.  OTOH, that existing knowledge is
> what makes the pitfalls so surprising.

Hmmm, yes, but, with writeback=True, you do work around the most
surprising pitfalls (at a price in performance, of course).  I dunno -- with
so many other impressive modules to show off, maybe shelve might
be avoided.

> - threading/Queue (because without direction people grab thread and
> mutexes)

True, they do.  But I don't know if the tutorial is the right time to 
indoctrinate people about proper Python threading architectures.

> - timeit (because it answers most performance questions in a jiffy)
> - unittest (because TDD folks like myself live by it)

Absolute agreement here.  And doctest is SO easy to use, that for
the limited space of the tutorial it might also be quite appropriate -- it
also encourages abundant use of docstrings, a neat thing in itself.


Alex


From Kepes.Krisztian at peto.hu  Thu Nov 27 05:16:40 2003
From: Kepes.Krisztian at peto.hu (Kepes Krisztian)
Date: Thu Nov 27 05:16:35 2003
Subject: [Python-Dev] list and string - method wishlist
Message-ID: <879726515.20031127111640@peto.hu>

Hi !

A.)

The string object have a method named "index", and have a method named
"find".
It is good, because many times we need to find anything, and it is
very long to write this:

try:
   i=s.index('a')
except:
   i=-1
if i<>-1: pass

and not this:

if (s.find('a')<>-1): pass

Why don't exists same method in the list object ?

It is very ugly thing (sorry, but I must say that).

I must write in every times:

l=[1,2,3,4]
try:
   i=l.index(5)
except:
   i=-1
if i<>-1: pass

and not this:
if (l.find(5)<>-1): pass

B.)

Same thing is the deleting.

I think, this method is missing from strings, and lists.

Example:

I must write this:

s='abcdef'
l=[1,2,5,3,4,5]

print s
s=s[:2]+s[3:]
print s

print l
l[2]=None
l.remove(None)
print l

and not this:
s='abcdef'
l=[1,2,5,3,4,5]
s=s.delete(2)
l.delete(2)

and delete more:
s.delete()  # s=''
l.delete()  # l=[]
s.delete(2,2) # s='abef'
l.delete(2,2) #  l=[1,2,4,5]


So: some functions/methods are neeeded to Python-like programming
(less write, more effectivity).


KK


From aleax at aleax.it  Thu Nov 27 05:39:10 2003
From: aleax at aleax.it (Alex Martelli)
Date: Thu Nov 27 05:39:15 2003
Subject: [Python-Dev] list and string - method wishlist
In-Reply-To: <879726515.20031127111640@peto.hu>
References: <879726515.20031127111640@peto.hu>
Message-ID: <200311271139.10505.aleax@aleax.it>

On Thursday 27 November 2003 11:16 am, Kepes Krisztian wrote:
   ...
> try:
>    i=s.index('a')
> except:
>    i=-1
> if i<>-1: pass
>
> and not this:
>
> if (s.find('a')<>-1): pass

Why don't you use the clearer, faster, more readable, AND more concise idiom
    if 'a' in s: pass
instead?

> Why don't exists same method in the list object ?

The 'in' operator works just fine for lists, too.

Perhaps if you studied Python's present capabilities a bit better, before
requesting changes and additions to Python, you might achieve better
results faster.


> Same thing is the deleting.
>
> I think, this method is missing from strings, and lists.

Look at the 'del' keyword (and slice assignments) -- for lists only:

> print l
> l[2]=None
> l.remove(None)

del l[2]

or equivalently

l[2:3] = []

> and delete more:
> s.delete()  # s=''

Python strings are immutable and will always remain immutable.  There is
NO way to change an existing string object and there will never be.

> l.delete()  # l=[]

l[:] = []

or equivalently

del l[:]

> s.delete(2,2) # s='abef'

Ditto.

> l.delete(2,2) #  l=[1,2,4,5]

l[2:4] = []

or equivalently

del l[2:4]


> So: some functions/methods are neeeded to Python-like programming
> (less write, more effectivity).

This is quite possible, but I have seen almost none listed in your wishlist.
I.e., the only task you've listed that is not performed with easy, popular and
widespread Python idioms would seem to be a string method roughly equivalent
to the function:

def delete(s, from, upto=None):
    if upto is None: upto = from + 1
    return s[:from] + s[upto:]

returning "a copy of s except for this slice".  However, the addition of more
functions and methods that might (perhaps) save typing a few characters,
allowing a hypothetical

z = s.delete(a, b)

in lieu of

z = s[:a] + s[b:]

must overcome a serious general objection: as your very request shows,
people ALREADY fail to notice and learn a lot of what Python offers today.

Adding more and more marginally-useful functions and methods might therefore 
more likely just cause people to fail to notice and learn a larger fraction 
of Python's capabilities, rather than supply any burningly needed usefulness.


Alex


From Kepes.Krisztian at peto.hu  Thu Nov 27 06:55:44 2003
From: Kepes.Krisztian at peto.hu (Kepes Krisztian)
Date: Thu Nov 27 06:55:44 2003
Subject: [Python-Dev] Java final vs Py __del__
Message-ID: <19315670392.20031127125544@peto.hu>

Hi !

I very wonder, when I get exp. in java with GC.

I'm Delphi programmer, so I get used to destructorin objects.

In Java the final method is not same, but is like to destructor (I has
been think...).

And then I try with some examples, I see, that the Java GC is
sometimes not call this method of objects, only exit from program.
So: the java programs sometimes end before the GC is use the final
methods on objects.

This mean that in Java the critical operations MUST do correctly by
the programmmers, or some data losing happened.
If it is open a file, then must write the critical modifications, and
must use the flush, and close to be sure to the datas are saved.

In the Py the __del__ is same java's final, or it is to be called in
every way by GC ?

I build this method as safe method: if the programmer don't do any
closing/freeing thing, I do that ?

simple example:

class a:
      def __init__(self,filename):
          self.__filename=filename
          self.__data=[]
          self.__file=None
      def open(self):
          self.__file=open(self.__filename,"w")
      def write(self,data):
          self.__data.append(data)
      def close(self):
          self.__file.writelines(self.__data)
          self.__file.close()
          self.__file=None
      def __del__(self):
          if self.__file<>None:
             self.close()
          # like destructor: we do the things are forgotten by
          programmer

Thanx for infos:
 KK

      
From mwh at python.net  Thu Nov 27 07:02:52 2003
From: mwh at python.net (Michael Hudson)
Date: Thu Nov 27 07:02:57 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> (Greg
	Ewing's message of "Thu, 27 Nov 2003 13:26:08 +1300 (NZDT)")
References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>
Message-ID: <2mn0aiqas3.fsf@starship.python.net>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> Hunter Peress <hunterp@fastmail.fm>:
>
>> a[1] +  b[2] + c[3]...currently gives an error message that doesnt say
>> which variable the list index error occurs in or at which index it occurs
>> at
>
> This would be considerably improved if the error message could
> just point out the position in the line instead of just the line
> number.

Any ideas how to do that?  I guess you could obfuscate c_lnotab even
more...

> Especially when a statement spans more than one line -- currently
> you can't even tell which line of a multi-line statement was the
> culprit!

This is occasionally very annoying, and is probably fixable -- would
require pretty serious compiler hackery, though.

Cheers,
mwh

-- 
3. Syntactic sugar causes cancer of the semicolon.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html

From aleax at aleax.it  Thu Nov 27 07:54:41 2003
From: aleax at aleax.it (Alex Martelli)
Date: Thu Nov 27 07:54:48 2003
Subject: [Python-Dev] Java final vs Py __del__
In-Reply-To: <19315670392.20031127125544@peto.hu>
References: <19315670392.20031127125544@peto.hu>
Message-ID: <200311271354.41367.aleax@aleax.it>

On Thursday 27 November 2003 12:55 pm, Kepes Krisztian wrote:
> Hi !

Hi Kepes.  These questions are improper to pose here on Python-Dev,
which is a mailing list about the development OF Python; for questions
that are just related to Python programming, please send them to
the general list, python-list@python.org, or help@python.org instead.


I'm answering them this time, but please don't use this list again in
the future unless it is for issues related to the development OF Python,
thanks.

> In Java the final method is not same, but is like to destructor (I has

You're confusing final (which is a Java keyword indicating a method
that cannot be overridden in subclasses) with finalize -- there is no
connection at all between these two concepts in Java.

The Python _language_ gives just as few guarantees about calling
finalizers (__del__ in Python) as Java (otherwise, it would not be
possible to implement Python on top of a Java Virtual Machine, yet
Jython, the Python implementation running on a JVM, works quite
productively).  Some specific implementation (such as a given release
of "classic Python") may happen to do a bit more, but for reliability you
will want to use try/finally in Python just as you would in Java.


Alex


From ajs at optonline.net  Thu Nov 27 08:52:51 2003
From: ajs at optonline.net (Arthur)
Date: Thu Nov 27 09:30:19 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
Message-ID: <000601c3b4ed$bfea2d70$1c02a8c0@BasementDell>

>I'm interested to know what your experiences have been >with
teaching.python.

How about one's experience in learning Python?

It is clear to me - in retrospect - that the absence of copy() as a
_built-in__ worked negatively.  Learning isn't a linear process, and I can't
make a simple linear argument as to why this is so. But it has more to do
with getting a handle on assignment, than the direct use or lack of use of
copy() itself. I tried to open up discussion of this issue on edu-sig, and
was asked by Guido to take a hike.  I had apparently chosen an inappropriate
forum.

I understand that a move of copy() to built-ins is not in the cards.  Number
#1 on a list of library modules in a tutorial may well be a better solution.
I strongly encourage you to stick with your instincts and intuition here.
But by all means including Guido's koan about the overuse of copy by novices
as part of that presentation.

Art


From pinard at iro.umontreal.ca  Thu Nov 27 10:26:49 2003
From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard)
Date: Thu Nov 27 10:40:36 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
In-Reply-To: <200311271113.05578.aleax@aleax.it>
References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
	<200311271113.05578.aleax@aleax.it>
Message-ID: <20031127152649.GA4044@titan.progiciels-bpi.ca>

[Alex Martelli]

> cStringIO -- I've noticed most newbies find it more natural to "write to
> a cStringIO.StringIO pseudofile as they go" then getvalue, rather than
> append'ing to a list of strings then ''.join .

I do not doubt that cStringIO is useful to know, and a tutorial could
throw a short glimpse here about why the `c' prefix and speed issues.

For a newcomer, here might be a good opportunity for illustrating one
surprising capability of Python for those coming from other languages,
which is using bound methods as "first-class" objects.  Like:

    fragments = []
    write = fragments.append
    ...
    <code using `write' above>
    ...
    result = ''.join(fragments)

I think this approach is not much more difficult than `StringIO', not so
bad efficiency-wise, but likely more fruitful about developing Python
useful understanding and abilities.  A tutorial might also show that the
said `write' could be given and received in functions, which do not have
to "know" if they are writing to a file, or in-memory fragments.

-- 
Fran?ois Pinard   http://www.iro.umontreal.ca/~pinard

From ark-mlist at att.net  Thu Nov 27 11:25:58 2003
From: ark-mlist at att.net (Andrew Koenig)
Date: Thu Nov 27 11:25:50 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <3FC44CD5.2070009@ocf.berkeley.edu>
Message-ID: <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop>

> I mostly agree with everything Guido has said.  It probably should only
> be used when -OO is switched on.  And yes, iterative solutions tend to
> be easier to grasp.

Not always.

For example, suppose you want to find out how many (decimal) digits are in a
(non-negative) integer.  Yes, you could convert it to a string and see how
long the string is, but suppose you want to do it directly.  Then it is easy
to solve the problem recursively by making use of two facts:

	1) Non-negative integers less than 10 have one digit.

	2) If x > 10, x//10 has one fewer digit than x.

These two facts yield the following recursive solution:

	def numdigits(n):
	    assert n >= 0 and n%1 == 0
	    if n < 10:
	        return 1
	    return 1 + numdigits(n//10)

An iterative version of this function might look like this:

	def numdigits(n):
	    assert n >= 0 and n%1 == 0:
	    length = 1
	    while n >= 10:
	        length += 1
	        n //= 10
	    return length

Although these two functions are pretty clearly equivalent, I find the
recursive one much easier to understand.  Moreover, I don't know how to
write an interative version that is as easy to understand as the recursive
version.  Think, for example, how you might go about proving the iterative
version correct.


From tim.one at comcast.net  Thu Nov 27 12:10:56 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Nov 27 12:11:03 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop>
Message-ID: <LNBBLJKPBEHFEDALKOLCAELLHGAB.tim.one@comcast.net>

[Brett]
>> I mostly agree with everything Guido has said.  It probably should
>> only be used when -OO is switched on.  And yes, iterative solutions
>> tend to be easier to grasp.

[Andrew Koenig]
> Not always.
>
> For example, suppose you want to find out how many (decimal) digits
> are in a (non-nega tive) integer.  Yes, you could convert it to a
> string and see how long the string is, but suppose you want to do it
> directly.  Then it is easy to solve the problem recursively by making
> use of two facts:
>
> 	1) Non-negative integers less than 10 have one digit.
>
> 	2) If x > 10, x//10 has one fewer digit than x.
>
> These two facts yield the following recursive solution:
>
> 	def numdigits(n):
> 	    assert n >= 0 and n%1 == 0
> 	    if n < 10:
> 	        return 1
> 	     return 1 + numdigits(n//10)

Easy to understand, but it's not tail-recursive, so isn't an example of what
was suggested for Brett to investigate.  I think a tail-recursive version is
more obscure than your iterative one:

    def numdigits(n):
        def inner(n, lensofar):
            if n < 10:
                return lensofar
            else:
                return inner(n//10, lensofar+1)
        return inner(n, 1)

>  An iterative version of this function might look like this:
>
> 	def numdigits(n):
> 	    assert n >= 0 and n%1 == 0:
> 	    length = 1
> 	    while n >= 10:
> 	        length += 1
> 	        n //= 10
> 	    return length
>
> Although these two functions are pretty clearly equivalent, I find the
> recursive one much easier to understand.  Moreover, I don't know how
> to write an interative version that is as easy to understand as the
> recursive version.  Think, for example, how you might go about
> proving the iterative version correct.

Exactly the same way as proving the tail-recursive version is correct
<wink>.

A different approach makes iteration much more natural:  the number of
digits in n (>= 0) is the least i >= 1 such that 10**i > n.  Then iterative
code is an obvious search loop:

    i = 1
    while 10**i <= n:
        i += 1
    return i

Play strength-reduction tricks to get exponentiation out of the loop, and
it's (just) a teensy bit less obvous.


From devin at whitebread.org  Thu Nov 27 14:19:04 2003
From: devin at whitebread.org (Devin)
Date: Thu Nov 27 12:12:23 2003
Subject: [Python-Dev] Tail recursion
Message-ID: <Pine.LNX.4.44.0311271117500.7373-100000@whitebread.org>

On Thu, 27 Nov 2003, Andrew Koenig wrote:

> --snip--
> Moreover, I don't know how to write an interative version that is as
> easy to understand as the recursive version.

::Lurk mode off::

import math

def numdigits(n):
    assert (n >= 0) and ((n % 1) == 0)
    if n < 10:
        return 1
    return int(math.log10(n)) + 1

(not iterative, but it'll do :)

::Lurk mode on::

-- 
Devin
devin@whitebread.org
http://www.whitebread.org/


From tim.one at comcast.net  Thu Nov 27 12:21:02 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Nov 27 12:21:06 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <Pine.LNX.4.44.0311271117500.7373-100000@whitebread.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCMELMHGAB.tim.one@comcast.net>

[Devin]
> import math
>
> def numdigits(n):
>     assert (n >= 0) and ((n % 1) == 0)
>     if n < 10:
>         return 1
>     return int(math.log10(n)) + 1
>
> (not iterative, but it'll do :)

Nope, integers in Python are unbounded, and this will deliver wrong answers
for "big enough" integers.  Depending on the vagaries of your platform C's
log10 implementation, it may even deliver a wrong answer for small n near an
exact power of 10.


From guido at python.org  Thu Nov 27 12:30:33 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov 27 12:32:47 2003
Subject: [Python-Dev] "groupby" iterator
Message-ID: <200311271730.hARHUXg15777@c-24-5-183-134.client.comcast.net>

In the shower (really!) I was thinking about the old problem of going
through a list of items that are supposed to be grouped by some key,
and doing something extra at the end of each group.  I usually end up
doing something ugly like this:

    oldkey = None
    for item in sequence:
        newkey = item.key # this could be any function of item
        if newkey != oldkey and oldkey is not None:
            ...do group processing...
            oldkey = newkey
        ...do item processing...
    ...do group processing... # for final group

This is ugly because the group processing code has to be written twice
(or turned into a mini-subroutine); it also doesn't handle empty
sequences correctly.  Solutions based on using an explicit index and
peeking ahead are similarly cumbersome and hard to get right for all
end cases.

So I realized this is easy to do with a generator, assuming we can
handle keeping a list of all items in a group.  Here's the generator:

    def groupby(key, iterable):
        it = iter(iterable)
        value = it.next() # If there are no items, this takes an early exit
        oldkey = key(value)
        group = [value]
        for value in it:
            newkey = key(value)
            if newkey != oldkey:
                yield group
                group = []
                oldkey = newkey
            group.append(value)
        yield group

Here's the usage ("item.key" is just an example):

    for group in groupby(lambda item: item.key, sequence):
        for item in group:
            ...item processing...
        ...group processing...

The only caveat is that if a group is very large, this accumulates all
its items in a large list.  I expect the generator can be reworked to
return an iterator instead, but getting the details worked out seems
too much work for a summy Thanskgiving morning. :-)

Example:

    # Print lines of /etc/passwd, sorted, grouped by first letter
    lines = open("/etc/passwd").readlines()
    lines.sort()
    for group in groupby(lambda s: s[0], lines):
        print "-"*10
        for line in group: print line,
    print "-"*10

Maybe Raymond can add this to the itertools module?

Or is there a more elegant approach than my original code that I've
missed all these years?

--Guido van Rossum (home page: http://www.python.org/~guido/)

From gerrit at nl.linux.org  Thu Nov 27 12:37:01 2003
From: gerrit at nl.linux.org (Gerrit Holl)
Date: Thu Nov 27 12:37:22 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
In-Reply-To: <200311271113.05578.aleax@aleax.it>
References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
	<200311271113.05578.aleax@aleax.it>
Message-ID: <20031127173701.GA4140@nl.linux.org>

[I'm Gerrit Holl (18) and I've been using Python for 3-4 years]

Alex Martelli wrote:
> time, datetime, calendar -- many real programs want to deal with
> dates and times

In my opinion, we should not include all three in the tutorial. I think
only datetime should be included. datetime has largely the same niche as
time, with the difference that datetime is object oriented and time is
not. In my opinion, this makes datetime superior to time. Further, I
think calender isn't used a lot...

    calendar, format3c, format3cstring, month, monthcalendar, prcal,
    prmonth, prweek, week, weekheader
        Those mostly copy the unix cal utility. They probably can be
        useful, but I'm not sure when. Don't most GUI's provide tools
        for selecting a date from a window? 

    isleap, leapdays
        Useful functions. Never used them, though.

    firstweekday, setfirstweekday
        Don't really know when/why to use them
    
    timegm
        Doesn't belong here

I think the calendar module does not contain enough functionality in
order to justify it to be included in the tutorial.

I think datetime does belong in the tutorial, while time and calendar do
not.

yours,
Gerrit.

-- 
242. If any one hire oxen for a year, he shall pay four gur of corn for
plow-oxen.
          -- 1780 BC, Hammurabi, Code of Law
-- 
Asperger's Syndrome - a personal approach:
	http://people.nl.linux.org/~gerrit/english/

From ark-mlist at att.net  Thu Nov 27 12:40:11 2003
From: ark-mlist at att.net (Andrew Koenig)
Date: Thu Nov 27 12:40:37 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAELLHGAB.tim.one@comcast.net>
Message-ID: <015601c3b50d$824973c0$6402a8c0@arkdesktop>

> Easy to understand, but it's not tail-recursive, so isn't an example of
> what
> was suggested for Brett to investigate.  I think a tail-recursive version
> is
> more obscure than your iterative one:
> 
>     def numdigits(n):
>         def inner(n, lensofar):
>             if n < 10:
>                 return lensofar
>             else:
>                 return inner(n//10, lensofar+1)
>         return inner(n, 1)

Ah.  I will agree with you that wholly tail-recursive programs are usually
no easier to understand than their iterative counterparts.

On the other hand, there are partially tail-recursive functions that I find
easier to understand, such as

	def traverse(t, f):
	    if nonempty(t):
	        traverse(t.left, f)
	        traverse(t.right, f)

Here, the second call to traverse is tail-recursive; the first isn't.  Of
course it could be rewritten this way

	def traverse(t, f):
	    while nonempty(t):
	        traverse(t.left, f)
	        t = t.right

but I think that this rewrite makes the code harder to follow and would
prefer that the compiler do it for me.

> A different approach makes iteration much more natural:  the number of
> digits in n (>= 0) is the least i >= 1 such that 10**i > n.  Then
> iterative
> code is an obvious search loop:
> 
>     i = 1
>     while 10**i <= n:
>         i += 1
>     return i
> 
> Play strength-reduction tricks to get exponentiation out of the loop, and
> it's (just) a teensy bit less obvous.

This code relies on 10**i being exact.  Is that guaranteed?


From guido at python.org  Thu Nov 27 12:41:29 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov 27 12:41:37 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: Your message of "Thu, 27 Nov 2003 12:02:52 GMT."
	<2mn0aiqas3.fsf@starship.python.net> 
References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>  
	<2mn0aiqas3.fsf@starship.python.net> 
Message-ID: <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net>

> >> a[1] + b[2] + c[3]...currently gives an error message that doesnt
> >> say which variable the list index error occurs in or at which
> >> index it occurs at

I would like to point out that one solution suggested here (store the
most recently used name in the object itself) cannot work -- in an
expression like x[i][j], if it is the [j] part that fails, the object
name displayed might be some local variable in an earlier scope that
briefly refenced x[i], and that would be just plain confusing.  This
apart from the significant memory and CPU time overhead (which I
expect whoever requested the feature doesn't care about, until they
have code that runs too slow, and then they will requested a
Python-to-C compiler, and be indignant when they are asked to write it
themselves :-).

> > This would be considerably improved if the error message could
> > just point out the position in the line instead of just the line
> > number.
> 
> Any ideas how to do that?  I guess you could obfuscate c_lnotab even
> more...

Probably not worth it.  (I should mention that I have a possible use
case for messing with the lnotab to contain line numbers in a
different file than the Python source code. :-)

> > Especially when a statement spans more than one line -- currently
> > you can't even tell which line of a multi-line statement was the
> > culprit!
> 
> This is occasionally very annoying, and is probably fixable -- would
> require pretty serious compiler hackery, though.

BTW, for the special case of multi-line argument lists, it is already
fixed.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From gerrit at nl.linux.org  Thu Nov 27 12:44:30 2003
From: gerrit at nl.linux.org (Gerrit Holl)
Date: Thu Nov 27 12:44:51 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
Message-ID: <20031127174430.GB4140@nl.linux.org>

Raymond Hettinger wrote:
> I'm adding section to the tutorial with a brief sampling of library
> offerings and some short examples of how to use them.

I think it's a great idea.

> My first draft included:
>     copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib

> - csv (basic tool for sharing data with other applications)
> - datetime (comes up frequently in real apps and admin tasks)
> - ftplib (because the examples are so brief)
> - getopt or optparse (because the task is common)

If one of those is chosen, I'd go for the latter, because it can do
more and it's more OO.

> - operator (because otherwise, the functionals can be a PITA)
> - pprint (because beauty counts)
> - struct (because fixed record layouts are common)
> - threading/Queue (because without direction people grab thread and
> mutexes)

Hm, not sure whether this should be in the tutorial.

> - timeit (because it answers most performance questions in a jiffy)
> - unittest (because TDD folks like myself live by it)

- email (because it's impressive and common)
- textwrap (because I love it :) and it's useful)

But of course, it should stay a tutorial, and not become a reference.
Users are intelligent enough to skim through the standard library
looking for libraries. We should make a selection. Maybe some of them
should only be pointed to, without going into detail about how to use
it?

yours,
Gerrit.

-- 
135. If a man be taken prisoner in war and there be no sustenance in
his house and his wife go to another house and bear children; and if later
her husband return and come to his home: then this wife shall return to
her husband, but the children follow their father.
          -- 1780 BC, Hammurabi, Code of Law
-- 
Asperger's Syndrome - a personal approach:
	http://people.nl.linux.org/~gerrit/english/

From guido at python.org  Thu Nov 27 12:45:12 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov 27 12:45:39 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: Your message of "Thu, 27 Nov 2003 11:25:58 EST."
	<00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop> 
References: <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop> 
Message-ID: <200311271745.hARHjCN15844@c-24-5-183-134.client.comcast.net>

> For example, suppose you want to find out how many (decimal) digits are in a
> (non-negative) integer.  Yes, you could convert it to a string and see how
> long the string is, but suppose you want to do it directly.  Then it is easy
> to solve the problem recursively by making use of two facts:
> 
> 	1) Non-negative integers less than 10 have one digit.
> 
> 	2) If x > 10, x//10 has one fewer digit than x.
> 
> These two facts yield the following recursive solution:
> 
> 	def numdigits(n):
> 	    assert n >= 0 and n%1 == 0
> 	    if n < 10:
> 	        return 1
> 	    return 1 + numdigits(n//10)
> 
> An iterative version of this function might look like this:
> 
> 	def numdigits(n):
> 	    assert n >= 0 and n%1 == 0:
> 	    length = 1
> 	    while n >= 10:
> 	        length += 1
> 	        n //= 10
> 	    return length
> 
> Although these two functions are pretty clearly equivalent, I find the
> recursive one much easier to understand.  Moreover, I don't know how to
> write an interative version that is as easy to understand as the recursive
> version.  Think, for example, how you might go about proving the iterative
> version correct.

Hm.  The iterative version looks totally fine to me.  I wonder if it
all depends on the (recursive) definition with which you started.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Thu Nov 27 12:50:39 2003
From: aahz at pythoncraft.com (Aahz)
Date: Thu Nov 27 12:51:49 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <015601c3b50d$824973c0$6402a8c0@arkdesktop>
References: <LNBBLJKPBEHFEDALKOLCAELLHGAB.tim.one@comcast.net>
	<015601c3b50d$824973c0$6402a8c0@arkdesktop>
Message-ID: <20031127175039.GA13922@panix.com>

On Thu, Nov 27, 2003, Andrew Koenig wrote:
>Tim Peters:
>>
>> A different approach makes iteration much more natural:  the number of
>> digits in n (>= 0) is the least i >= 1 such that 10**i > n.  Then
>> iterative
>> code is an obvious search loop:
>> 
>>     i = 1
>>     while 10**i <= n:
>>         i += 1
>>     return i
> 
> This code relies on 10**i being exact.  Is that guaranteed?

For Python ints, yes.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote 
programs, then the first woodpecker that came along would destroy civilization.

From mwh at python.net  Thu Nov 27 12:52:47 2003
From: mwh at python.net (Michael Hudson)
Date: Thu Nov 27 12:52:51 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net>
	(Guido van Rossum's message of "Thu, 27 Nov 2003 09:41:29 -0800")
References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>
	<2mn0aiqas3.fsf@starship.python.net>
	<200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net>
Message-ID: <2misl5r95c.fsf@starship.python.net>

Guido van Rossum <guido@python.org> writes:

>> > This would be considerably improved if the error message could
>> > just point out the position in the line instead of just the line
>> > number.
>> 
>> Any ideas how to do that?  I guess you could obfuscate c_lnotab even
>> more...
>
> Probably not worth it.  (I should mention that I have a possible use
> case for messing with the lnotab to contain line numbers in a
> different file than the Python source code. :-)

That's not c_lnotab, is it?  More likely co_firstlineno & co_filename.
But anyway, eek!

>> > Especially when a statement spans more than one line -- currently
>> > you can't even tell which line of a multi-line statement was the
>> > culprit!
>> 
>> This is occasionally very annoying, and is probably fixable -- would
>> require pretty serious compiler hackery, though.
>
> BTW, for the special case of multi-line argument lists, it is already
> fixed.

So it is.  I guess the other situations that are worth fixing are long
container -- list, tuple, dict -- literals.  My brain is a bit too
fried to think if a more general solution is feasible, but I will
point out that since SET_LINENO went away, inserting superfluous calls
to com_set_lineno doesn't result in superfluous bytecodes, so perhaps
that could just be added to com_node or something.  Although IIRC, in
{k:v} v is evaluated before k, which could make life entertaining.

Cheers,
mwh

-- 
  ARTHUR:  Don't ask me how it works or I'll start to whimper.
                   -- The Hitch-Hikers Guide to the Galaxy, Episode 11

From bac at OCF.Berkeley.EDU  Thu Nov 27 14:21:37 2003
From: bac at OCF.Berkeley.EDU (Brett C.)
Date: Thu Nov 27 14:21:44 2003
Subject: [Python-Dev] python-dev Summary for 10-16-2003 through
	11-15-2003[draft]
In-Reply-To: <005c01c3b221$bf2d7c80$edb02c81@oemcomputer>
References: <005c01c3b221$bf2d7c80$edb02c81@oemcomputer>
Message-ID: <3FC64EC1.1000306@ocf.berkeley.edu>

Raymond Hettinger wrote:

>>If you ever wanted to have the power of list comprehensions but
> 
> without
> 
>>the overhead of generating the entire list you have Peter Norvig
>>initially and then what seems like the rest of the world for generator
>>expressions.
> 
> 
> [possibly mangled sentence doesn't make sense]
> 

Or me not typing as fast as my brain is working.  There is a critical 
"to thank" missing from that sentence.

> 
> 
> 
>>After the addition of the 'key' argument to list.sort(), people began
> 
> to
> 
>>clamor for list.sort() to return self.  Guido refused to do give in,
> 
> so
> 
>>a compromise was reached.  'list' now has a class method named
> 
> 'sorted'.
> 
>>  Pass it a list and it will return a *copy* of that list sorted.
> 
> 
> 
> [Add]
> What makes a class method so attractive is that the argument need not be
> a list, any iterable will do.  The return value *is* of course a list.
> 
> By returning a list instead of None, list.sorted() can be used as an
> expression instead of a statement.  This makes it possible to use it as
> an argument in a function call or as the iterable in a for-loop::
> 
>     # iterate over a dictionary sorted by key
>     for key, value in list.sorted(mydict.iteritems()):
>

Changed it to state that it takes an iterable.  Didn't add the full-on 
tutorial on use, though.  Chances are people who read the Summary know 
Python well enough to realize the method's use.

> 
> 
> 
>>As an interim solution, itertools grew a new function: tee.  It takes
> 
> in
> 
>>an iterable and returns two iterators which independently iterate over
>>the iterable.
> 
> 
> [replace] two
> [with] two or more
> 
> 

Done.

> 
> 
>>The point that operator.isMappingType is kind of broken came up.  Both
>>Alex and Raymond Hettinger would not mind seeing it disappear.  No one
>>objected.  It is still in CVS at the moment, but I would not count on
> 
> it
> 
>>necessarily sticking around.
> 
> 
> ["It's not quite dead yet" ;-)  Actually, there may be a way to
> partially fix-it so that it won't be totally useless].
> 
> 

Fixed.

> 
> 
>>There was a new built-in named reversed(), and all rejoiced.
> 
> 
> [And much flogging of the person who proposed it]
> 
> 

Fixed.  =)

>  
> 
>>Straight from the function's doc string: "reverse iterator over values
>>of the sequence".  `PEP 322`_ has the relevant details on this toy.
> 
> 
> [Replace] toy
> [With] major technological innovation of the first order
> [Or just] builtin.
> 
> 

I went with the latter since I need to keep some journalistic integrity 
and thus not be too biased.  =)

> 
> 
> 
>>Sets now at blazing C speeds!
> 
> 
> [Looks like a certain parroteer will soon by eating pie!]
> 
> 
> 
> Another fine summary.
> Thanks for the good work.
> 

You're quite welcome.

Happy Thanksgiving, Raymond (and everyone else out there).

-Brett


From theller at python.net  Thu Nov 27 14:50:59 2003
From: theller at python.net (Thomas Heller)
Date: Thu Nov 27 14:51:09 2003
Subject: [Python-Dev] Patch to distutils.msvccompiler, 2.3 branch
Message-ID: <brqx615o.fsf@python.net>

Several people on this list (IIRC Jim, Guido, Jeremy) have been bitten
by the problem that distutils couldn't build extensions with MSVC6,
complaining that the compiler isn't installed although in fact it was
installed.

The problem always seemed to be that MSVC6 only writes the complete
registry entries which distutils requires after the GUI has been run at
least one time.

I have uploaded a patch to the latest bug report from Jim,
http://www.python.org/sf/848614, which tries to detect these incomplete
registry entries.  It works for me (having removed and installed MSVC
several times, with and without running the gui).  It would be great if
someone else would try it out and report if it works correctly - IMO it
should be commited to the 2.3 maintanance branch before 2.3.3 goes out.
(Suggestions for better wording would be accepted ;-)

The effect of this patch would be the folling outputs from a
'setup.py build_ext' command, depending on the compiler installation
state:

Not installed:

  error: Python was built with version 6 of Visual Studio, and
  extensions need to be built with the same version of the compiler, but
  it isn't installed.

Installed, but the GUI has never been run:

  warning: It seems you have Visual Studio 6 installed, but the expected
  registry settings are not present.  You must at least run the Visual
  Studio GUI once so that these entries are created.
  error: Python was built with version 6 of Visual Studio, and
  extensions need to be built with the same version of the compiler, but
  it isn't installed.

Installed, and GUI has been run: the extension should build normally.


Thanks,

Thomas


From pje at telecommunity.com  Thu Nov 27 15:34:45 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Nov 27 15:33:36 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.ne
 t>
References: <Your message of "Thu,
	27 Nov 2003 12:02:52 GMT." <2mn0aiqas3.fsf@starship.python.net>
	<200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>
	<2mn0aiqas3.fsf@starship.python.net>
Message-ID: <5.1.0.14.0.20031127153121.03d42ce0@mail.telecommunity.com>

At 09:41 AM 11/27/03 -0800, Guido van Rossum wrote:
>Probably not worth it.  (I should mention that I have a possible use
>case for messing with the lnotab to contain line numbers in a
>different file than the Python source code. :-)

DTML, perhaps?  ;-)

Yes, if the format is changed to add columns, it would be nice to make it 
be capable of having the code in one code block actually come from more 
than one file, or from non-contiguous lines in one file.  Tools that use 
Python as an output format, or that preprocess Python (e.g. the dozen or so 
templating libraries out there), could really use something like a #line 
directive.


From gball at cfa.harvard.edu  Thu Nov 27 15:58:47 2003
From: gball at cfa.harvard.edu (Greg Ball)
Date: Thu Nov 27 15:58:51 2003
Subject: [Python-Dev] "groupby" iterator
Message-ID: <Pine.LNX.4.44.0311271535490.1066-100000@tane.cfa.harvard.edu>

Here's a reworking which returns iterators.  I had to decide what to do if 
the user tries to access things out of order; I raise an exception.  
Anything else would complicate the code quite a lot I think.


def groupby(key, iterable):
    it = iter(iterable)
    value = it.next() # If there are no items, this takes an early exit
    oldkey = [key(value)]
    cache = [value]
    lock = []
    def grouper():
        yield cache.pop()
        for value in it:
            newkey = key(value)
            if newkey == oldkey[0]:
                yield value
            else:
                oldkey[0] = newkey
                cache.append(value)
                break
        del lock[0]
    while 1:
        if lock:
            raise LookupError, "groups accessed out of order"
        if not cache:
            break
        lock.append(1)
        yield grouper()


--Greg Ball


From Jack.Jansen at cwi.nl  Thu Nov 27 16:16:26 2003
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Thu Nov 27 16:16:32 2003
Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary
In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer>
Message-ID: <F5D258E8-211E-11D8-AAFD-000A27B19B96@cwi.nl>


On 26-nov-03, at 21:56, Raymond Hettinger wrote:

> I'm adding section to the tutorial with a brief sampling of library
> offerings and some short examples of how to use them.
>
> My first draft included:
>     copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib

My 2 cents (and actually what I plan to do for MacPython, Some Day:-):
pick a small number of tutorials where you solve toy versions of
real world problems from different domains. For example you could do
a "publish spreadsheet to website" where you showcase csv,
and urllib, or maybe the reverse "turn html table into csv" so you
can show htmllib too); "analyse some sort of logfile" where you could
probably show datetime, re and maybe glob and optparse; "something 
scientific" could
probably show cmath and random and a few others; "form mailer" could 
show
cgi, pprint and email.

I think the advantage of examples from real world problem domains is
that people will pick the one that they can relate to, and hence not
only will they understand what the problem is all about (i.e. people 
won't
look at a complex number example if they haven't a clue what a complex 
number is),
but also the functionality demonstrated should produce the "aha!"
that we're after.
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman


From guido at python.org  Thu Nov 27 16:49:40 2003
From: guido at python.org (Guido van Rossum)
Date: Thu Nov 27 16:49:52 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Thu, 27 Nov 2003 15:58:47 EST."
	<Pine.LNX.4.44.0311271535490.1066-100000@tane.cfa.harvard.edu> 
References: <Pine.LNX.4.44.0311271535490.1066-100000@tane.cfa.harvard.edu> 
Message-ID: <200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net>

> Here's a reworking which returns iterators.  I had to decide what to do if 
> the user tries to access things out of order; I raise an exception.  
> Anything else would complicate the code quite a lot I think.
> 
> def groupby(key, iterable):
>     it = iter(iterable)
>     value = it.next() # If there are no items, this takes an early exit
>     oldkey = [key(value)]
>     cache = [value]
>     lock = []
>     def grouper():
>         yield cache.pop()
>         for value in it:
>             newkey = key(value)
>             if newkey == oldkey[0]:
>                 yield value
>             else:
>                 oldkey[0] = newkey
>                 cache.append(value)
>                 break
>         del lock[0]
>     while 1:
>         if lock:
>             raise LookupError, "groups accessed out of order"
>         if not cache:
>             break
>         lock.append(1)
>         yield grouper()

Thanks!  Here's a class version of the same, which strikes me as
slightly easier to understand (though probably slower due to all the
instance variable access).  It may serve as an easier model for a C
implementation.  I decided not to deal explicitly with out-of-order
access; if the caller doesn't play by the rules, some of their groups
will be split and jumbled, but each split group will still have
matching keys.

class GroupBy(object):
    def __init__(self, key, iterable):
        self.key = key
        self.it = iter(iterable)
        self.todo = []
    def __iter__(self):
        return self
    def next(self):
        if self.todo:
            value, oldkey = self.todo.pop()
        else:
            value = self.it.next() # Exit if this raises StopIteration
            oldkey = self.key(value)
        return self._grouper(value, oldkey)
    def _grouper(self, value, oldkey):
        yield value
        for value in self.it:
            newkey = self.key(value)
            if newkey != oldkey:
                self.todo.append((value, newkey))
                break
            yield value

This is an example of what's so cool about iterators and generators:
You can code a particular idiom or mini-pattern (in this case grouping
list items) once and apply it to lots of situations.  That's of course
what all subroutines do, but iterators and generators open up lots of
places where previously it wasn't convenient to use a subroutine
(you'd have to use lots of lambdas -- or you'd have to have a language
supporting anonymous code blocks, which provide a lot of the same
power in a different way).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Thu Nov 27 17:08:35 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu Nov 27 17:09:25 2003
Subject: [Python-Dev] Patch to distutils.msvccompiler, 2.3 branch
In-Reply-To: <brqx615o.fsf@python.net>
References: <brqx615o.fsf@python.net>
Message-ID: <m33cc9h3bw.fsf@mira.informatik.hu-berlin.de>

Thomas Heller <theller@python.net> writes:

> It would be great if
> someone else would try it out and report if it works correctly - IMO it
> should be commited to the 2.3 maintanance branch before 2.3.3 goes out.
> (Suggestions for better wording would be accepted ;-)

I'm willing to trust you that you get this right.

Most of us probably aren't even aware that VS6 has different registry
settings depending on whether it was ever invoked after being installed.

Regards,
Martin

From tdelaney at avaya.com  Thu Nov 27 17:19:45 2003
From: tdelaney at avaya.com (Delaney, Timothy C (Timothy))
Date: Thu Nov 27 17:19:52 2003
Subject: [Python-Dev] Patch to distutils.msvccompiler, 2.3 branch
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEEECA9C@au3010avexu1.global.avaya.com>

> From: Martin v. L?wis
> 
> Thomas Heller <theller@python.net> writes:
> 
> > It would be great if
> > someone else would try it out and report if it works 
> correctly - IMO it
> > should be commited to the 2.3 maintanance branch before 
> 2.3.3 goes out.
> > (Suggestions for better wording would be accepted ;-)
> 
> I'm willing to trust you that you get this right.
> 
> Most of us probably aren't even aware that VS6 has different registry
> settings depending on whether it was ever invoked after being 
> installed.

This is something I come across all the time with Microsoft product - in particular, we have a product which is a plugin to Microsoft Visio.

Our installer currently has to detect that Visio is installed, but hasn't been run, and tell them to run it. I intend to rewrite the installer soon (from WISE to NSIS) and hopefully I'll be able to improve this behaviour ...

Tim Delaney

From greg at cosc.canterbury.ac.nz  Thu Nov 27 18:08:14 2003
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu Nov 27 18:08:21 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <2mn0aiqas3.fsf@starship.python.net>
Message-ID: <200311272308.hARN8EE02347@oma.cosc.canterbury.ac.nz>

Michael Hudson <mwh@python.net>:

> > This would be considerably improved if the error message could
> > just point out the position in the line instead of just the line
> > number.
> 
> Any ideas how to do that?  I guess you could obfuscate c_lnotab even
> more...

It would need to contain a lot more information, one way or another.

I don't know whether it would be worth going to heroic lengths to
compress it, though.  Maybe it would be better to invest the effort in
making the lineno tables lazily loaded instead -- leave them in the
.pyc file until they're needed.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+

From tjreedy at udel.edu  Thu Nov 27 21:09:33 2003
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu Nov 27 21:09:41 2003
Subject: [Python-Dev] Re: Patch to distutils.msvccompiler, 2.3 branch
References: <brqx615o.fsf@python.net>
Message-ID: <bq6aot$elo$1@sea.gmane.org>


"Thomas Heller" <theller@python.net> wrote in message
news:brqx615o.fsf@python.net...
> Several people on this list (IIRC Jim, Guido, Jeremy) have been
bitten
> by the problem that distutils couldn't build extensions with MSVC6,
> complaining that the compiler isn't installed although in fact it
was
> installed.

In my view, distutils is correct.  VC6 has been loaded but
installation is not finished.

> The problem always seemed to be that MSVC6 only writes the complete
> registry entries which distutils requires after the GUI has been run
at
> least one time.

I have seen other programs (some games, in particular, that I
remembert) do this sort of thing -- do final installation phase on
first execution.  Sometimes there is a menu option to repeat this
phase without reloading.

> Not installed:

I would call this 'Not loaded'

>
>   error: Python was built with version 6 of Visual Studio, and
>   extensions need to be built with the same version of the compiler,
but
>   it isn't installed.
>
> Installed, but the GUI has never been run:

and this 'Loaded, but installation incomplete'

> Installed, and GUI has been run: the extension should build
normally.

and this 'Fully installed'

Terry J. Reedy


From tim.one at comcast.net  Fri Nov 28 00:56:40 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Nov 28 00:56:42 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <015601c3b50d$824973c0$6402a8c0@arkdesktop>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEODHGAB.tim.one@comcast.net>

[Andrew Koenig]
> Ah.  I will agree with you that wholly tail-recursive programs are
> usually no easier to understand than their iterative counterparts.

Good!  That's why I've never been keen to "do something" about tail
recursion in Python -- the "one obvious way" to write a loop in Python is
with a loop <wink>.

> On the other hand, there are partially tail-recursive functions that
> I find easier to understand, such as
>
> 	def traverse(t, f):
> 	    if nonempty(t):
> 	        traverse(t.left, f)
> 	        traverse(t.right, f)
>
> Here, the second call to traverse is tail-recursive; the first isn't.
>
> Of course it could be rewritten this way
>
> 	def traverse(t, f):
> 	    while nonempty(t):
> 	        traverse(t.left, f)
> 	        t = t.right
>
> but I think that this rewrite makes the code harder to follow

I agree.  Worse still is writing it iteratively with an explicit stack.
Note that PEP 255 has both spellings for a tree-walking generator, and the
fully iterative spelling is much harder to understand.

>  would prefer that the compiler do it for me.

I don't in Python:  if I coded a call, I want Python to make a call.
WYSIWYG contributes greatly to the debuggability of large Python programs in
practice.

>>    i = 1
>>     while 10**i <= n:
>>         i += 1
>>     return i

> This code relies on 10**i being exact.

Also on + being exact, and the other code in this thread depended on //
being exact.

> Is that guaranteed?

+ - * // % ** pow and divmod on integers in Python will either deliver an
exact result or raise an exception (like MemoryError if malloc() can't find
enough space to hold an intermediate result).


From mwh at python.net  Fri Nov 28 09:51:28 2003
From: mwh at python.net (Michael Hudson)
Date: Fri Nov 28 09:51:32 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <2misl5r95c.fsf@starship.python.net> (Michael Hudson's message
	of "Thu, 27 Nov 2003 17:52:47 +0000")
References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>
	<2mn0aiqas3.fsf@starship.python.net>
	<200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net>
	<2misl5r95c.fsf@starship.python.net>
Message-ID: <2mekvsr1fz.fsf@starship.python.net>

Michael Hudson <mwh@python.net> writes:

> Although IIRC, in {k:v} v is evaluated before k, which could make
> life entertaining.

Another situation where (more unavoidably) execution "goes backwards":

r = [i for i in 
     somelist]

Cheers,
mwh

-- 
 (Of course SML does have its weaknesses, but by comparison, a
  discussion of C++'s strengths and flaws always sounds like an
  argument about whether one should face north or east when one
  is sacrificing one's goat to the rain god.)         -- Thant Tessman

From mwh at python.net  Fri Nov 28 10:58:58 2003
From: mwh at python.net (Michael Hudson)
Date: Fri Nov 28 10:59:02 2003
Subject: [Python-Dev] less quick patch for better debugging.
In-Reply-To: <2misl5r95c.fsf@starship.python.net> (Michael Hudson's message
	of "Thu, 27 Nov 2003 17:52:47 +0000")
References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz>
	<2mn0aiqas3.fsf@starship.python.net>
	<200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net>
	<2misl5r95c.fsf@starship.python.net>
Message-ID: <2m7k1kqybh.fsf@starship.python.net>

Michael Hudson <mwh@python.net> writes:

> Guido van Rossum <guido@python.org> writes:

>> BTW, for the special case of multi-line argument lists, it is already
>> fixed.
>
> So it is.  I guess the other situations that are worth fixing are long
> container -- list, tuple, dict -- literals.  My brain is a bit too
> fried to think if a more general solution is feasible, but I will
> point out that since SET_LINENO went away, inserting superfluous calls
> to com_set_lineno doesn't result in superfluous bytecodes, so perhaps
> that could just be added to com_node or something. 

Brain still fried, so someone else will have to tell me what's wrong
with:

    http://python.org/sf/850789

which as sketched above calls com_set_lineno in every invocation of
com_node and removes all the other calls.

Cheers,
mh

-- 
  Ability to type on a computer terminal is no guarantee of sanity,
  intelligence, or common sense. 
                                 -- Gene Spafford's Axiom #2 of Usenet

From pinard at iro.umontreal.ca  Fri Nov 28 11:44:12 2003
From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard)
Date: Fri Nov 28 11:57:09 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEODHGAB.tim.one@comcast.net>
References: <015601c3b50d$824973c0$6402a8c0@arkdesktop>
	<LNBBLJKPBEHFEDALKOLCAEODHGAB.tim.one@comcast.net>
Message-ID: <20031128164412.GA3028@titan.progiciels-bpi.ca>

[Tim Peters]
> [Andrew Koenig]
> > Ah.  I will agree with you that wholly tail-recursive programs are
> > usually no easier to understand than their iterative counterparts.

> Good!  That's why I've never been keen to "do something" about tail
> recursion in Python -- the "one obvious way" to write a loop in Python is
> with a loop <wink>.

Just a tiny remark on that topic.  In my experience, it is rather
unusual that I need to use tail recursion in a way that would not easily
express itself with a simple loop, and more clearly that way.

However, there are a few rare cases in which algorithms use tail
recursion at various places and paths in a single function, in such a
way that untangling these into a single loop would not be easy.  But
such situtations (let's call them [1]) are uncommon in practice.

Moreover, tail recursion is an optimisation matter, and situations in
which speed is excruciatingly important (let's call them [2]) are far
less frequent, still in practice, than some people tend to believe.

Since [1] and [2] are kind of independant, we could consider that it
is extremely uncommon that we meet [1] and [2] simultaneously.  So, in
practice, it might be that Python does not really need tail recursion.

> > On the other hand, there are partially tail-recursive functions that
> > I find easier to understand, such as [...]

Yes, of course, if an algorithm expresses itself more clearly using
a notation which happens to be tail recursive, do not hesitate at
expressing it that way, especially given that _on average_, one may
safely assert that the algorithm is not speed-critical.  Rare exceptions
exist and can be used to build counter-examples, but these should not be
seen as really compelling.

On the other hand, if Guido feels like accepting tail-recursion in
Python for the sake of an intellectual exercise or for the pleasure of
its elegance, let's go for it.  It cannot really hurt that much :-).

-- 
Fran?ois Pinard   http://www.iro.umontreal.ca/~pinard

From guido at python.org  Fri Nov 28 13:00:12 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 28 13:00:37 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: Your message of "Fri, 28 Nov 2003 00:56:40 EST."
	<LNBBLJKPBEHFEDALKOLCAEODHGAB.tim.one@comcast.net> 
References: <LNBBLJKPBEHFEDALKOLCAEODHGAB.tim.one@comcast.net> 
Message-ID: <200311281800.hASI0CW17161@c-24-5-183-134.client.comcast.net>

> + - * // % ** pow and divmod on integers in Python will either deliver an
> exact result or raise an exception (like MemoryError if malloc() can't find
> enough space to hold an intermediate result).

Except for ** if the exponent is negative.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From gerrit at nl.linux.org  Fri Nov 28 14:49:59 2003
From: gerrit at nl.linux.org (Gerrit Holl)
Date: Fri Nov 28 14:50:29 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
Message-ID: <20031128194959.GA4886@nl.linux.org>

Raymond Hettinger wrote:
> Date: Tue, 25 Nov 2003 07:26:15 +0100

> After re-reading previous posts on the subject, I had an idea.  Let's
> isolate these functions in the documentation into a separate section
> following the rest of the builtins.

I would like to nominate input() also. It is often misused by beginners.
A better choice is almost always raw_input(). In the standard library,
fpformat.py seems to be the only one using it. Further, I see
Demo/classes/Dbm.py uses it, but that seems to be all. How about
banishing input() too?

yours,
Gerrit.

-- 
59. If any man, without the knowledge of the owner of a garden, fell a
tree in a garden he shall pay half a mina in money.
          -- 1780 BC, Hammurabi, Code of Law
-- 
Asperger's Syndrome - a personal approach:
	http://people.nl.linux.org/~gerrit/english/

From perky at i18n.org  Fri Nov 28 14:54:20 2003
From: perky at i18n.org (Hye-Shik Chang)
Date: Fri Nov 28 14:54:31 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net>
References: <Pine.LNX.4.44.0311271535490.1066-100000@tane.cfa.harvard.edu>
	<200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net>
Message-ID: <20031128195420.GA63319@i18n.org>

On Thu, Nov 27, 2003 at 01:49:40PM -0800, Guido van Rossum wrote:
> 
> Thanks!  Here's a class version of the same, which strikes me as
> slightly easier to understand (though probably slower due to all the
> instance variable access).  It may serve as an easier model for a C
> implementation.  I decided not to deal explicitly with out-of-order
> access; if the caller doesn't play by the rules, some of their groups
> will be split and jumbled, but each split group will still have
> matching keys.

Here's yet another implementation for itertoolsmodule.c. (see
attachment)
I wrote it after the shower (really!) :)


Regards,
  Hye-Shik
-------------- next part --------------
Index: Modules/itertoolsmodule.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/itertoolsmodule.c,v
retrieving revision 1.26
diff -u -r1.26 itertoolsmodule.c
--- Modules/itertoolsmodule.c	12 Nov 2003 14:32:26 -0000	1.26
+++ Modules/itertoolsmodule.c	28 Nov 2003 19:46:43 -0000
@@ -2081,6 +2081,272 @@
 };
 
 
+/* groupby object ***********************************************************/
+
+typedef struct {
+	PyObject_HEAD
+	PyObject *it;
+	PyObject *key;
+	PyObject *oldvalue;
+	PyObject *oldkey;
+} groupbyobject;
+
+static PyTypeObject groupby_type;
+static PyObject *_itergroup_create(groupbyobject *);
+
+static PyObject *
+groupby_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
+{
+	groupbyobject *gbo;
+	PyObject *it, *key;
+
+	if (!PyArg_ParseTuple(args, "OO:groupby", &key, &it))
+		return NULL;
+
+	if (!PyCallable_Check(key)) {
+		PyErr_SetString(PyExc_ValueError,
+		   "Key argument must be a callable object.");
+		return NULL;
+	}
+
+	gbo = (groupbyobject *)type->tp_alloc(type, 0);
+	if (gbo == NULL)
+		return NULL;
+	gbo->oldvalue = NULL;
+	gbo->oldkey = NULL;
+	gbo->key = key;
+	Py_INCREF(key);
+	gbo->it = PyObject_GetIter(it);
+	if (it == NULL) {
+		Py_DECREF(gbo);
+		return NULL;
+	}
+	return (PyObject *)gbo;
+}
+
+static void
+groupby_dealloc(groupbyobject *gbo)
+{
+	PyObject_GC_UnTrack(gbo);
+	Py_XDECREF(gbo->it);
+	Py_XDECREF(gbo->key);
+	Py_XDECREF(gbo->oldvalue);
+	Py_XDECREF(gbo->oldkey);
+	gbo->ob_type->tp_free(gbo);
+}
+
+static int
+groupby_traverse(groupbyobject *gbo, visitproc visit, void *arg)
+{
+	int err;
+
+	if (gbo->it) {
+		err = visit(gbo->it, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->key) {
+		err = visit(gbo->key, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->oldvalue) {
+		err = visit(gbo->oldvalue, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->oldkey) {
+		err = visit(gbo->oldkey, arg);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static PyObject *
+groupby_next(groupbyobject *gbo)
+{
+	if (gbo->oldvalue == NULL) {
+		gbo->oldvalue = PyIter_Next(gbo->it);
+		if (gbo->oldvalue == NULL)
+			return NULL;
+	}
+
+	return _itergroup_create(gbo);
+}
+
+PyDoc_STRVAR(groupby_doc,
+"groupby(key, iterable) -> create an iterator which returns sub-iterators\n\
+grouped by key(value).\n");
+
+static PyTypeObject groupby_type = {
+	PyObject_HEAD_INIT(NULL)
+	0,				/* ob_size */
+	"itertools.groupby",		/* tp_name */
+	sizeof(groupbyobject),		/* tp_basicsize */
+	0,				/* tp_itemsize */
+	/* methods */
+	(destructor)groupby_dealloc,	/* tp_dealloc */
+	0,				/* tp_print */
+	0,				/* tp_getattr */
+	0,				/* tp_setattr */
+	0,				/* tp_compare */
+	0,				/* tp_repr */
+	0,				/* tp_as_number */
+	0,				/* tp_as_sequence */
+	0,				/* tp_as_mapping */
+	0,				/* tp_hash */
+	0,				/* tp_call */
+	0,				/* tp_str */
+	PyObject_GenericGetAttr,	/* tp_getattro */
+	0,				/* tp_setattro */
+	0,				/* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
+		Py_TPFLAGS_BASETYPE,	/* tp_flags */
+	groupby_doc,			/* tp_doc */
+	(traverseproc)groupby_traverse,	/* tp_traverse */
+	0,				/* tp_clear */
+	0,				/* tp_richcompare */
+	0,				/* tp_weaklistoffset */
+	PyObject_SelfIter,		/* tp_iter */
+	(iternextfunc)groupby_next,	/* tp_iternext */
+	0,				/* tp_methods */
+	0,				/* tp_members */
+	0,				/* tp_getset */
+	0,				/* tp_base */
+	0,				/* tp_dict */
+	0,				/* tp_descr_get */
+	0,				/* tp_descr_set */
+	0,				/* tp_dictoffset */
+	0,				/* tp_init */
+	0,				/* tp_alloc */
+	groupby_new,			/* tp_new */
+	PyObject_GC_Del,		/* tp_free */
+};
+
+
+/* _itergroup object (internal) **********************************************/
+
+typedef struct {
+	PyObject_HEAD
+	PyObject *parent;
+} _itergroupobject;
+
+static PyTypeObject _itergroup_type;
+
+static PyObject *
+_itergroup_create(groupbyobject *parent)
+{
+	_itergroupobject *igo;
+
+	igo = PyObject_New(_itergroupobject, &_itergroup_type);
+	if (igo == NULL)
+		return PyErr_NoMemory();
+	igo->parent = (PyObject *)parent;
+	Py_INCREF(parent);
+
+	return (PyObject *)igo;
+}
+
+static void
+_itergroup_dealloc(_itergroupobject *igo)
+{
+	Py_XDECREF(igo->parent);
+	PyObject_Del(igo);
+}
+
+static PyObject *
+_itergroup_next(_itergroupobject *igo)
+{
+	groupbyobject *gbo = (groupbyobject *)igo->parent;
+	PyObject *value, *newkey;
+	int rcmp;
+
+	if (gbo->oldvalue != NULL) {
+		value = gbo->oldvalue;
+		gbo->oldvalue = NULL;
+	} else {
+		value = PyIter_Next(gbo->it);
+		if (value == NULL)
+			return NULL;
+	}
+
+	newkey = PyObject_CallFunctionObjArgs(gbo->key, value, NULL);
+	if (newkey == NULL) {
+		/* throw the value away because it may fail on next iteration
+		 * trial again. */
+		Py_DECREF(value);
+		return NULL;
+	}
+
+	if (gbo->oldkey == NULL) {
+		gbo->oldkey = newkey;
+		return value;
+	} else if (PyObject_Cmp(gbo->oldkey, newkey, &rcmp) == -1) {
+		Py_DECREF(newkey);
+		return NULL;
+	}
+
+	if (rcmp == 0) {
+		Py_DECREF(newkey);
+		return value;
+	} else {
+		Py_DECREF(gbo->oldkey);
+		gbo->oldkey = newkey;
+		gbo->oldvalue = value;
+		return NULL;
+	}
+}
+
+static PyTypeObject _itergroup_type = {
+	PyObject_HEAD_INIT(NULL)
+	0,				/* ob_size */
+	"itertools._itergroup",		/* tp_name */
+	sizeof(_itergroupobject),	/* tp_basicsize */
+	0,				/* tp_itemsize */
+	/* methods */
+	(destructor)_itergroup_dealloc,	/* tp_dealloc */
+	0,				/* tp_print */
+	0,				/* tp_getattr */
+	0,				/* tp_setattr */
+	0,				/* tp_compare */
+	0,				/* tp_repr */
+	0,				/* tp_as_number */
+	0,				/* tp_as_sequence */
+	0,				/* tp_as_mapping */
+	0,				/* tp_hash */
+	0,				/* tp_call */
+	0,				/* tp_str */
+	PyObject_GenericGetAttr,	/* tp_getattro */
+	0,				/* tp_setattro */
+	0,				/* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT,		/* tp_flags */
+	0,				/* tp_doc */
+	0, 				/* tp_traverse */
+	0,				/* tp_clear */
+	0,				/* tp_richcompare */
+	0,				/* tp_weaklistoffset */
+	PyObject_SelfIter,		/* tp_iter */
+	(iternextfunc)_itergroup_next,	/* tp_iternext */
+	0,				/* tp_methods */
+	0,				/* tp_members */
+	0,				/* tp_getset */
+	0,				/* tp_base */
+	0,				/* tp_dict */
+	0,				/* tp_descr_get */
+	0,				/* tp_descr_set */
+	0,				/* tp_dictoffset */
+	0,				/* tp_init */
+	0,				/* tp_alloc */
+	0,				/* tp_new */
+	_PyObject_Del,			/* tp_free */
+};
+
+
 /* module level code ********************************************************/
 
 PyDoc_STRVAR(module_doc,
@@ -2103,6 +2369,7 @@
 chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ... \n\
 takewhile(pred, seq) --> seq[0], seq[1], until pred fails\n\
 dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails\n\
+groupby(key, iterable) --> iterates iteraters by group\n\
 ");
 
 
@@ -2130,6 +2397,7 @@
 		&count_type,
 		&izip_type,
 		&repeat_type,
+		&groupby_type,
 		NULL
 	};
 
From python at rcn.com  Fri Nov 28 16:14:40 2003
From: python at rcn.com (Raymond Hettinger)
Date: Fri Nov 28 16:15:22 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <20031128194959.GA4886@nl.linux.org>
Message-ID: <001801c3b5f4$a45cd1e0$e841fea9@oemcomputer>

> I would like to nominate input() also. It is often misused by
beginners.
> A better choice is almost always raw_input(). In the standard library,
> fpformat.py seems to be the only one using it. Further, I see
> Demo/classes/Dbm.py uses it, but that seems to be all. How about
> banishing input() too?

I won't name names, but input() has a very important friend who happens
to be a dictator, the author of the tutorial, and the creator of a well
thought out programming language.

The risks are clearly documented.  So no one can't say they weren't
warned.  

Also, it does have its uses and is friendly to beginning programmers who
don't enjoy having to coerce strings back to the data type they actually
wanted.  Also, it is somewhat nice to be able enter expressions in
personal, interactive scripts.


all-builtins-have-at-least-one-friend,


Raymond Hettinger


From guido at python.org  Fri Nov 28 16:42:04 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 28 16:42:37 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: Your message of "Fri, 28 Nov 2003 20:49:59 +0100."
	<20031128194959.GA4886@nl.linux.org> 
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>  
	<20031128194959.GA4886@nl.linux.org> 
Message-ID: <200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net>

> I would like to nominate input() also. It is often misused by beginners.

I've seen many programming texts for real beginners that use it --
it's handy to be able to read numbers before you have explained
strings or how to parse them.

So I say let's be kind on input().

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Nov 28 16:46:53 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 28 16:47:01 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Sat, 29 Nov 2003 04:54:20 +0900."
	<20031128195420.GA63319@i18n.org> 
References: <Pine.LNX.4.44.0311271535490.1066-100000@tane.cfa.harvard.edu>
	<200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net> 
	<20031128195420.GA63319@i18n.org> 
Message-ID: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>

> Here's yet another implementation for itertoolsmodule.c. (see
> attachment) I wrote it after the shower (really!) :)

Wow!  Thanks.  Let's all remember to take or showers and maybe Python
will become the cleanest programming language. :)

Raymond, what do you think?

I would make one change: after looking at another use case, I'd like
to change the outer iterator to produce (key, grouper) tuples.  This
way, you can write things like

  totals = {}
  for key, group in sequence:
      totals[key] = sum(group)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Fri Nov 28 18:24:30 2003
From: python at rcn.com (Raymond Hettinger)
Date: Fri Nov 28 18:25:14 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
Message-ID: <002701c3b606$c61304a0$e841fea9@oemcomputer>

> > Here's yet another implementation for itertoolsmodule.c. (see
> > attachment) I wrote it after the shower (really!) :)
> 
> Wow!  Thanks.  Let's all remember to take or showers and maybe Python
> will become the cleanest programming language. :)
> 
> Raymond, what do you think?

Yes.  I recommend taking showers on a regular basis ;-)

I'll experiment with groupby() for a few more days and see how it feels.
The first impression is that it meets all the criteria for becoming an
itertool (iters in, iters out; no unexpected memory use; works well with
other tools; not readily constructed from existing tools).  

At first, the tool seems more special purpose than general purpose.
OTOH, it is an excellent solution to a specific class of problems and it
makes code much cleaner by avoiding the repeated code block in the
non-iterator version.


> I would make one change: after looking at another use case, I'd like
> to change the outer iterator to produce (key, grouper) tuples.  This
> way, you can write things like
> 
>   totals = {}
>   for key, group in sequence:
>       totals[key] = sum(group)

This is a much stronger formulation than the original.  It is clear,
succinct, expressive, and less error prone.

The implementation would be more complex than the original.  If the
group is ignored, the outer iterator needs to be smart enough to read
through the input iterator until the next group is encountered:

>>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P']
>>> firstname = lambda n: n.split()[0]
>>> names.sort()
>>> unique_first_names = [first for first, _ in groupby(firstname,
names)]
['Barry' , 'Jack', 'Tim']


In experimenting with groupby(), I am starting to see a need for a high
speed data extractor function.  This need is common to several tools
that take function arguments (like list.sort(key=)).  While extractor
functions can be arbitrarily complex, many only fetch a specific
attribute or element number.  Alex's high-speed curry suggests that it
is possible to create a function maker for fast lookups:

students.sort(key=extract('grade'))  # key=lambda r:r.grade
students.sort(key=extract(2))        # key=lambda r:[2]


Raymond


From guido at python.org  Fri Nov 28 18:41:58 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 28 18:42:09 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Fri, 28 Nov 2003 18:24:30 EST."
	<002701c3b606$c61304a0$e841fea9@oemcomputer> 
References: <002701c3b606$c61304a0$e841fea9@oemcomputer> 
Message-ID: <200311282341.hASNfwE17612@c-24-5-183-134.client.comcast.net>

> Yes.  I recommend taking showers on a regular basis ;-)

Jack Jansen wants me to add: especially right after riding your
bicycle to work.  And my boss will agree.  (Enough for in-jokes that
no-one will get. :-)

> I'll experiment with groupby() for a few more days and see how it
> feels.  The first impression is that it meets all the criteria for
> becoming an itertool (iters in, iters out; no unexpected memory use;
> works well with other tools; not readily constructed from existing
> tools).

Right.

> At first, the tool seems more special purpose than general purpose.
> OTOH, it is an excellent solution to a specific class of problems and it
> makes code much cleaner by avoiding the repeated code block in the
> non-iterator version.
> 
> 
> > I would make one change: after looking at another use case, I'd like
> > to change the outer iterator to produce (key, grouper) tuples.  This
> > way, you can write things like
> > 
> >   totals = {}
> >   for key, group in sequence:
> >       totals[key] = sum(group)

Oops, there's a mistake.  I meant to say:

    totals = {}
    for key, group in groupby(keyfunc, sequence):
        totals[key] = sum(group)

> This is a much stronger formulation than the original.  It is clear,
> succinct, expressive, and less error prone.

I'm not sure to what extent this praise was inspired by my mistake of
leaving out the groupby() call.

> The implementation would be more complex than the original.

To the contrary.  It was a microscopic change to either of the Python
versions I posted, because the key to be returned is always available
at exactly the right time.

> If the
> group is ignored, the outer iterator needs to be smart enough to read
> through the input iterator until the next group is encountered:
> 
> >>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P']
> >>> firstname = lambda n: n.split()[0]
> >>> names.sort()
> >>> unique_first_names = [first for first, _ in groupby(firstname,
> names)]
> ['Barry' , 'Jack', 'Tim']

I don't think those semantics should be implemented.  You should be
required to iterate through each group.  I was just thinking that
returning the key might save the caller cumbersome logic if the key is
needed but the inner iterator is also needed.  The sum-by-group
example would become much uglier:

    totals = {}
    for group in groupby(keyfunc, sequence):
        first = group.next()
        key = keyfunc(first)
        totals[key] = first + sum(group, 0)

> In experimenting with groupby(), I am starting to see a need for a high
> speed data extractor function.  This need is common to several tools
> that take function arguments (like list.sort(key=)).

Exactly: it was definitely inspired by list.sort(key=).

> While extractor
> functions can be arbitrarily complex, many only fetch a specific
> attribute or element number.  Alex's high-speed curry suggests that it
> is possible to create a function maker for fast lookups:
> 
> students.sort(key=extract('grade'))  # key=lambda r:r.grade
> students.sort(key=extract(2))        # key=lambda r:[2]

Perhaps we could do this by changing list.sort() and groupby() to take
a string or int as first argument to mean exactly this.  For the
string case I had thought of this already (in my second shower today
:-); the int case makes sense too.  (Though it may weaken my objection
against property('foo') in a different thread. :-)

But I recommend holding off on this -- the "pure" groupby() has enough
merit without speed hacks, and I find the clarity it provides more
important than possible speed gains.  I expect that the original, ugly
code is usually faster, but in the cases where I've needed this I
don't care: either the sequence isn't all that long, or the program
doesn't run all that frequently, or it does so much other stuff that
the speed gain would be drowned in the noise.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From anthony at ekit-inc.com  Fri Nov 28 22:44:17 2003
From: anthony at ekit-inc.com (Anthony Baxter)
Date: Fri Nov 28 22:44:36 2003
Subject: [Python-Dev] minor interruption to service.
Message-ID: <200311290344.hAT3iHF5013034@maxim.off.ekorp.com>

I'm going to be pretty much offline for a week or so - we got burgled 
the other night while we were asleep and my laptop was stolen. The data's
backed up, but it'll be a few days til the replacement laptop arrives. 
In the meantime, if someone wants to take on the "upgrade to autoconf 2.59" 
task, I'd appreciate it very much.

thanks,
Anthony

From anthony at ekit-inc.com  Fri Nov 28 23:25:18 2003
From: anthony at ekit-inc.com (Anthony Baxter)
Date: Fri Nov 28 23:25:36 2003
Subject: [Python-Dev] test_mimetools failure when hostname unknown.
Message-ID: <200311290425.hAT4PIrO013499@maxim.off.ekorp.com>

If you have a machine who's local hostname is just something you've
set, and there's no matching entry in /etc/hosts, test_mimetools fails
with 
test test_mimetools failed -- Traceback (most recent call last):
  File "/home/anthony/src/py/23maint/Lib/test/test_mimetools.py", line 30, in test_boundary
    nb = mimetools.choose_boundary()
  File "/home/anthony/src/py/23maint/Lib/mimetools.py", line 130, in choose_boundary
    hostid = socket.gethostbyname(socket.gethostname())
gaierror: (-2, 'Name or service not known')

This seems, to me, to be a bit bogus - should we just, in this case,
have some sensible default (maybe just use the hostname, or 127.0.0.1)

And yes, I know this is not strictly a python bug, but it just popped 
up while I was building a new system up.

Anthony

From guido at python.org  Fri Nov 28 23:32:13 2003
From: guido at python.org (Guido van Rossum)
Date: Fri Nov 28 23:32:26 2003
Subject: [Python-Dev] test_mimetools failure when hostname unknown.
In-Reply-To: Your message of "Sat, 29 Nov 2003 15:25:18 +1100."
	<200311290425.hAT4PIrO013499@maxim.off.ekorp.com> 
References: <200311290425.hAT4PIrO013499@maxim.off.ekorp.com> 
Message-ID: <200311290432.hAT4WEF17810@c-24-5-183-134.client.comcast.net>

> If you have a machine who's local hostname is just something you've
> set, and there's no matching entry in /etc/hosts, test_mimetools fails
> with 
> test test_mimetools failed -- Traceback (most recent call last):
>   File "/home/anthony/src/py/23maint/Lib/test/test_mimetools.py", line 30, in test_boundary
>     nb = mimetools.choose_boundary()
>   File "/home/anthony/src/py/23maint/Lib/mimetools.py", line 130, in choose_boundary
>     hostid = socket.gethostbyname(socket.gethostname())
> gaierror: (-2, 'Name or service not known')
> 
> This seems, to me, to be a bit bogus - should we just, in this case,
> have some sensible default (maybe just use the hostname, or 127.0.0.1)
> 
> And yes, I know this is not strictly a python bug, but it just popped 
> up while I was building a new system up.

Yeah, in general all tests that use gethostname() are subject to
various kinds of errors like this.  It really shouldn't be used in the
test suite at all.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From tim.one at comcast.net  Sat Nov 29 00:34:01 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 29 00:34:05 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311271730.hARHUXg15777@c-24-5-183-134.client.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEBPHHAB.tim.one@comcast.net>

[Guido, on grouping elements of a sequence by key]
> ...
> Or is there a more elegant approach than my original code that I've
> missed all these years?

I've always done it like:

    d = {}
    for x in sequence:
        d.setdefault(key(x), []).append(x)
    # Now d has partitioned sequence by key.  The keys are
    # available as d.keys(), the associated groups as d.values().
    # So, e.g.,
    for key, group in d.iteritems():
        d[key] = sum(group)

There's no code duplication, or warts for an empty sequence, which are the
ugly parts of the non-dict approach.  It doesn't matter here whether the
elements orginally appear with equal keys all adjacent, and input often
isn't sorted that way.  When it isn't, not needing to sort first can be a
major time savings if the sequence is big.  Against it, a dict is a large
data structure.  I don't think it's ever been a real problem that it
requires keys to be hashable.

groupby() looks very nice when it applies.

> ...
>   totals = {}
>   for key, group in groupby(keyfunc, sequence):
>       totals[key] = sum(group)

Or

    totals = dict((key, sum(group))
                  for key, group in groupby(keyfunc, sequence))

exploiting generator expressions too.

[after Raymond wonders about cases where the consumer doesn't
 iterate over the group generators
]

> I don't think those semantics should be implemented.  You should be
> required to iterate through each group.

Brrrr.  Sounds error-prone (hard to explain, and impossible to enforce
unless the implementation does almost all the work it would need to allow
groups to get skipped -- if the implementation can detect that a group
hasn't been fully iterated, then it could almost as easily go on to skip
over remaining equal keys itself instead of whining about it; but if the
implementation can't detect it, accidental violations of the requirement
will be hard to track down).

You're a security guy now.  You've got a log with line records of the form

    month  day  hhmmss  severity_level  threat_id

It's sorted ascending by month then desceding by severity_level.  You want a
report of the top 10 threats seen each month.

    for month, lines in groupby(lamdba s: s.split()[0], input_file):
        print month
        print itertools.islice(lines, 10)

Like array[:10], islice() does the right thing if there are fewer than 10
lines in a month.  It's just not natural to require that an iterator be run
to exhaustion (if it *were* natural, this wouldn't be the first context ever
to require it <wink>).


From tim.one at comcast.net  Sat Nov 29 00:41:13 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 29 00:41:17 2003
Subject: [Python-Dev] Tail recursion
In-Reply-To: <200311281800.hASI0CW17161@c-24-5-183-134.client.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCCECAHHAB.tim.one@comcast.net>

[Tim]
>> + - * // % ** pow and divmod on integers in Python will either
>> deliver an exact result or raise an exception (like MemoryError if
>> malloc() can't find enough space to hold an intermediate result).

[Guido]
> Except for ** if the exponent is negative.

Yup, and I do keep forgetting that -- it's just an accident due to that
we're stilling using floats to approximate rationals <wink>.


From guido at python.org  Sat Nov 29 00:50:56 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 00:51:08 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Sat, 29 Nov 2003 00:34:01 EST."
	<LNBBLJKPBEHFEDALKOLCIEBPHHAB.tim.one@comcast.net> 
References: <LNBBLJKPBEHFEDALKOLCIEBPHHAB.tim.one@comcast.net> 
Message-ID: <200311290550.hAT5ouK17896@c-24-5-183-134.client.comcast.net>

> I've always done it like:
> 
>     d = {}
>     for x in sequence:
>         d.setdefault(key(x), []).append(x)
>     # Now d has partitioned sequence by key.  The keys are
>     # available as d.keys(), the associated groups as d.values().
>     # So, e.g.,
>     for key, group in d.iteritems():
>         d[key] = sum(group)
> 
> There's no code duplication, or warts for an empty sequence, which are the
> ugly parts of the non-dict approach.  It doesn't matter here whether the
> elements orginally appear with equal keys all adjacent, and input often
> isn't sorted that way.  When it isn't, not needing to sort first can be a
> major time savings if the sequence is big.  Against it, a dict is a large
> data structure.  I don't think it's ever been a real problem that it
> requires keys to be hashable.

The major downside of this is that this keeps everything in memory.
When that's acceptable, it's a great approach (especially because it
doesn't require sorting).  But often you really want to be able to
handle input of arbitrary size.  For example, suppose you are given a
file with some kind of records, timestamped and maintained in
chronological order (e.g. a log file -- perfect example of data that
won't fit in memory and is already sorted).  You're supposed to output
this for printing, while inserting a header at the start of each day
and a footer at the end of each day with various counts or totals per
day.

> groupby() looks very nice when it applies.

Right. :-)

> > ...
> >   totals = {}
> >   for key, group in groupby(keyfunc, sequence):
> >       totals[key] = sum(group)
> 
> Or
> 
>     totals = dict((key, sum(group))
>                   for key, group in groupby(keyfunc, sequence))
> 
> exploiting generator expressions too.

Nice.  When can we get these? :-)

> [after Raymond wonders about cases where the consumer doesn't
>  iterate over the group generators
> ]
> 
> > I don't think those semantics should be implemented.  You should be
> > required to iterate through each group.
> 
> Brrrr.  Sounds error-prone (hard to explain, and impossible to enforce
> unless the implementation does almost all the work it would need to allow
> groups to get skipped -- if the implementation can detect that a group
> hasn't been fully iterated, then it could almost as easily go on to skip
> over remaining equal keys itself instead of whining about it; but if the
> implementation can't detect it, accidental violations of the requirement
> will be hard to track down).

I take it back after seeing Raymond's implementation -- it's simple
enough to make sure that each group is exhausted before starting the
next group, and this is clearly the "natural" semantics.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Sat Nov 29 01:12:34 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 29 01:13:12 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
Message-ID: <000101c3b63f$c7fc4720$e841fea9@oemcomputer>

[Guido]
> I would make one change: after looking at another use case, I'd like
> to change the outer iterator to produce (key, grouper) tuples.  This
> way, you can write things like
> 
>   totals = {}
>   for key, group in sequence:
>       totals[key] = sum(group)

Here is an implementation that translates readily into C.  It uses
Guido's syntax and meets my requirement that bad things don't happen
when someone runs the outer iterator independently of the inner
iterator.

class groupby(object):
    __slots__ = ('keyfunc', 'it', 'tgtkey', 'currkey', 'currvalue')
    def __init__(self, key, iterable):
        NULL = 1+909.9j    # In C, use the real NULL
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = NULL
        self.currkey = NULL
        self.currvalue = NULL
    def __iter__(self):
        return self
    def next(self):
        while self.currkey == self.tgtkey:
            self.currvalue = self.it.next() # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.currkey))
    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            self.currvalue = self.it.next() # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)

import unittest

class TestBasicOps(unittest.TestCase):

    def test_groupby(self):
        # Check zero length input
        self.assertEqual([], list(groupby(lambda r:r[0], [])))

        # Check normal input
        s = [(0, 10, 20), (0, 11,21), (0,12,21), (1,13,21), (1,14,22),
             (2,15,22), (3,16,23), (3,17,23)]
        dup = []
        for k, g in groupby(lambda r:r[0], s):
            for elem in g:
                self.assertEqual(k, elem[0])
                dup.append(elem)
        self.assertEqual(s, dup)

        # Check nested case
        dup = []
        for k, g in groupby(lambda r:r[0], s):
            for ik, ig in groupby(lambda r:r[2], g):      
                for elem in ig:
                    self.assertEqual(k, elem[0])
                    self.assertEqual(ik, elem[2])               
                    dup.append(elem)
        self.assertEqual(s, dup)      

        # Check case where inner iterator is not used
        keys = []
        for k, g in groupby(lambda r:r[0], s):
            keys.append(k)
        expectedkeys = set([r[0] for r in s])
        self.assertEqual(set(keys), expectedkeys)
        self.assertEqual(len(keys), len(expectedkeys))

suite = unittest.TestSuite()
suite.addTest(unittest.makeSuite(TestBasicOps))
unittest.TextTestRunner(verbosity=2).run(suite)


Raymond


From python at rcn.com  Sat Nov 29 01:31:45 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 29 01:32:22 2003
Subject: [Python-Dev] genexps  Was: "groupby" iterator
In-Reply-To: <200311290550.hAT5ouK17896@c-24-5-183-134.client.comcast.net>
Message-ID: <000201c3b642$75b07060$e841fea9@oemcomputer>

> >     totals = dict((key, sum(group))
> >                   for key, group in groupby(keyfunc, sequence))
> >
> > exploiting generator expressions too.
> 
> Nice.  When can we get these? :-)


Unless someone in the know volunteers, it will need to wait until
Christmas vacation.  Currently, the implementation is beyond my skill
level.  It will take a while raise my skills to cover adding new syntax
and what to do in the compiler.


Raymond


From aleax at aleax.it  Sat Nov 29 02:03:22 2003
From: aleax at aleax.it (Alex Martelli)
Date: Sat Nov 29 02:03:28 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311282341.hASNfwE17612@c-24-5-183-134.client.comcast.net>
References: <002701c3b606$c61304a0$e841fea9@oemcomputer>
	<200311282341.hASNfwE17612@c-24-5-183-134.client.comcast.net>
Message-ID: <200311290803.22730.aleax@aleax.it>

On Saturday 29 November 2003 12:41 am, Guido van Rossum wrote:
   ...
> > >   totals = {}
> > >   for key, group in sequence:
> > >       totals[key] = sum(group)
>
> Oops, there's a mistake.  I meant to say:
>
>     totals = {}
>     for key, group in groupby(keyfunc, sequence):
>         totals[key] = sum(group)
>
> > This is a much stronger formulation than the original.  It is clear,
> > succinct, expressive, and less error prone.
>
> I'm not sure to what extent this praise was inspired by my mistake of
> leaving out the groupby() call.

Can't answer for RH, but, to me, the groupby call looks just fine.

However, one cosmetic suggestion: for analogy with list.sorted, why
not let the call be spelled as
    groupby(sequence, key=keyfunc)
?

I realize most itertools take a callable _first_, while, to be able to
name the key-extractor this way, it would have to go second.  I still
think it would be nicer, partly because while sequence could not
possibly default, key _could_ -- and its one obvious default is to an
identity (lambda x: x).  This would let elimination and/or counting of
adjacent duplicates be expressed smoothly (for counting, it would
help to have an ilen that gives the length of a finite iterable argument,
but worst case one can substitute
    def ilen(it):
        for i, _ in enumerate(it): pass
        return i+1
or its inline equivalent).

Naming the function 'grouped' rather than 'groupby' would probably
be better if the callable was the second arg rather than the first.

> > >>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P']
> > >>> firstname = lambda n: n.split()[0]
> > >>> names.sort()
> > >>> unique_first_names = [first for first, _ in groupby(firstname,
> > names)]
> > ['Barry' , 'Jack', 'Tim']
>
> I don't think those semantics should be implemented.  You should be
> required to iterate through each group.  I was just thinking that

Right, so basically it would have to be nested like:

ufn = [ f for g in groupby(firstname, names) for f, _ in g ]


> > In experimenting with groupby(), I am starting to see a need for a high
> > speed data extractor function.  This need is common to several tools
> > that take function arguments (like list.sort(key=)).
>
> Exactly: it was definitely inspired by list.sort(key=).

That's part of why I'd love to be able to spell key= for this iterator too.


> > While extractor
> > functions can be arbitrarily complex, many only fetch a specific
> > attribute or element number.  Alex's high-speed curry suggests that it
> > is possible to create a function maker for fast lookups:
> >
> > students.sort(key=extract('grade'))  # key=lambda r:r.grade
> > students.sort(key=extract(2))        # key=lambda r:[2]
>
> Perhaps we could do this by changing list.sort() and groupby() to take
> a string or int as first argument to mean exactly this.  For the

It seems to be that this would be specialcasing things while an extract
function might help in other contexts as well.  E.g., itertools has several
other iterators that take a callable and might use this.

> But I recommend holding off on this -- the "pure" groupby() has enough
> merit without speed hacks, and I find the clarity it provides more
> important than possible speed gains.  I expect that the original, ugly

I agree that the case for extract is separate from that for groupby (although
the latter does increase the attractiveness of the former).


Alex


From skumar at datec-systems.com  Sat Nov 29 02:47:24 2003
From: skumar at datec-systems.com (sa)
Date: Sat Nov 29 02:47:26 2003
Subject: [Python-Dev] Telnet server 
Message-ID: <001a01c3b64d$07660790$1501a8c0@datec21>

Hi all,

I want to develop a thin telnet server using the curses library .First of all is this possible ? The box on which I want to develop this, does not provide a shell as it runs customized embedded linux so i want to write a telnet server on the box  which presents the telnet client,  a curses kinda intrface after he gets past the login prompt authentication. I can execute my python scripts on this box without any problems . Any pointers ?

Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20031129/f53e14f8/attachment-0001.html
From python at rcn.com  Sat Nov 29 03:26:38 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 29 03:27:24 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311290803.22730.aleax@aleax.it>
Message-ID: <001801c3b652$8534a820$e841fea9@oemcomputer>

[Alex]
> However, one cosmetic suggestion: for analogy with list.sorted, why
> not let the call be spelled as
>     groupby(sequence, key=keyfunc)
> ?
> 
> I realize most itertools take a callable _first_, while, to be able to
> name the key-extractor this way, it would have to go second.  I still
> think it would be nicer, partly because while sequence could not
> possibly default, key _could_ -- and its one obvious default is to an
> identity (lambda x: x).  This would let elimination and/or counting of
> adjacent duplicates be expressed smoothly (for counting, it would
> help to have an ilen that gives the length of a finite iterable
argument,
> but worst case one can substitute
>     def ilen(it):
>         for i, _ in enumerate(it): pass
>         return i+1
> or its inline equivalent).


Though the argument order makes my stomach churn, the identity function
default is quite nice:


>>> s = 'abracadabra;

>>> # sort s | uniq
>>> [k for k, g in groupby(list.sorted(s))]
['a', 'b', 'c', 'd', 'r']

>>> # sort s | uniq -d
>>> [k for k, g in groupby(list.sorted('abracadabra')) if ilen(g)>1]
['a', 'b', 'r']

>>> # sort s | uniq -c
>>> [(ilen(g), k) for k, g in groupby(list.sorted(s))]
[(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')]
	
>>> sort s | uniq -c | sort -rn | head -3
>>> list.sorted([(ilen(g), k) for k, g in groupby(list.sorted(s))],
reverse=True)[:3]
[(5, 'a'), (2, 'r'), (2, 'b')]


> > > While extractor
> > > functions can be arbitrarily complex, many only fetch a specific
> > > attribute or element number.  Alex's high-speed curry suggests
that it
> > > is possible to create a function maker for fast lookups:
> > >
> > > students.sort(key=extract('grade'))  # key=lambda r:r.grade
> > > students.sort(key=extract(2))        # key=lambda r:[2]
> >
> > Perhaps we could do this by changing list.sort() and groupby() to
take
> > a string or int as first argument to mean exactly this.  For the
> 
> It seems to be that this would be specialcasing things while an
extract
> function might help in other contexts as well.  E.g., itertools has
> several
> other iterators that take a callable and might use this.
> 
> > But I recommend holding off on this -- the "pure" groupby() has
enough
> > merit without speed hacks, and I find the clarity it provides more
> > important than possible speed gains.  I expect that the original,
ugly
> 
> I agree that the case for extract is separate from that for groupby
> (although
> the latter does increase the attractiveness of the former).

Yes, it's clearly a separate issue (and icing on the cake).  I was
thinking extract() would be a nice addition to the operator module where
everything is basically a lambda evading speed hack for accessing
intrinsic operations:  operator.add = lambda x,y: x+y


Raymond


From barry at barrys-emacs.org  Sat Nov 29 09:50:38 2003
From: barry at barrys-emacs.org (Barry Scott)
Date: Sat Nov 29 09:50:42 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <20031128194959.GA4886@nl.linux.org>
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
	<20031128194959.GA4886@nl.linux.org>
Message-ID: <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private>

At 28-11-2003 19:49, you wrote:
>Raymond Hettinger wrote:
> > Date: Tue, 25 Nov 2003 07:26:15 +0100
>
> > After re-reading previous posts on the subject, I had an idea.  Let's
> > isolate these functions in the documentation into a separate section
> > following the rest of the builtins.

Is the `expr` worth banishing? I've never used it myself
because of the chance of misreading `expr` vs. 'expr'.
Isn't it a hard to read str()?

Note: I tried to find it in the language reference and its not in the index
but then neither is %.

Barry


From gerrit at nl.linux.org  Sat Nov 29 10:32:50 2003
From: gerrit at nl.linux.org (Gerrit Holl)
Date: Sat Nov 29 10:33:18 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private>
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
	<20031128194959.GA4886@nl.linux.org>
	<6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private>
Message-ID: <20031129153250.GA8274@nl.linux.org>

Barry Scott wrote:
> Is the `expr` worth banishing? I've never used it myself
> because of the chance of misreading `expr` vs. 'expr'.
> Isn't it a hard to read str()?

It's a hard-to-read repr(), actually.

Guido once published a list of Python regrets, which can be found at:
http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf

At page 5, it suggests to drop `...` for repr(...), so unless Guido
changed his mind (I don't think so), this is a deprecation-canddate as
well: as is callable() and input(), by the way.

yours,
Gerrit.

-- 
147. If she have not borne him children, then her mistress may sell her
for money.
          -- 1780 BC, Hammurabi, Code of Law
-- 
Asperger's Syndrome - a personal approach:
	http://people.nl.linux.org/~gerrit/english/

From gerrit at nl.linux.org  Sat Nov 29 10:37:12 2003
From: gerrit at nl.linux.org (Gerrit Holl)
Date: Sat Nov 29 10:37:36 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net>
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
	<20031128194959.GA4886@nl.linux.org>
	<200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net>
Message-ID: <20031129153712.GB8274@nl.linux.org>

[Gerrit]
> > I would like to nominate input() also. It is often misused by beginners.

[Guido van Rossum]
> So I say let's be kind on input().

Fine with me :)

But... at [0], raw_input() and input() are mentioned as minor regrets,
as functions which should actually not have been builtins. Have you now
changed your mind, or did I misinterpret [0], or is it something else?

[0] http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf

yours,
Gerrit.

-- 
134. If any one be captured in war and there is not sustenance in his
house, if then his wife go to another house this woman shall be held
blameless.
          -- 1780 BC, Hammurabi, Code of Law
-- 
Asperger's Syndrome - a personal approach:
	http://people.nl.linux.org/~gerrit/english/

From guido at python.org  Sat Nov 29 13:10:58 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 13:11:06 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: Your message of "Sat, 29 Nov 2003 14:50:38 GMT."
	<6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> 
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
	<20031128194959.GA4886@nl.linux.org> 
	<6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> 
Message-ID: <200311291810.hATIAwc18636@c-24-5-183-134.client.comcast.net>

> Is the `expr` worth banishing? I've never used it myself
> because of the chance of misreading `expr` vs. 'expr'.
> Isn't it a hard to read str()?

Yes, backticks will be gone in 3.0.  But I expect there's no hope of
getting rid of them earlier -- they've been used too much.  I suspect
that even putting in a deprecation warning would be too much.  (Maybe
a silent deprecation could work.)

So maybe these could be added to the list of language features moved
to a "doomed" section.

> Note: I tried to find it in the language reference and its not in the index
> but then neither is %.

I think none of the operators are in the index of the reference
manual.  I don't know how to resolve this; indexing non-alphanumeric
characters may not be easy in LaTeX, I don't know.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Nov 29 13:17:00 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 13:17:39 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: Your message of "Sat, 29 Nov 2003 16:37:12 +0100."
	<20031129153712.GB8274@nl.linux.org> 
References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer>
	<20031128194959.GA4886@nl.linux.org>
	<200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net> 
	<20031129153712.GB8274@nl.linux.org> 
Message-ID: <200311291817.hATIH0n18684@c-24-5-183-134.client.comcast.net>

> But... at [0], raw_input() and input() are mentioned as minor regrets,
> as functions which should actually not have been builtins. Have you now
> changed your mind, or did I misinterpret [0], or is it something else?
> 
> [0] http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf

Note that the regrets were minor. :-)

The problem is that these are almost never used in real programs; real
programs use sys.stdin.readline() so they can properly handle EOF.

But their main use, teaching Python to beginners without having to
expose the whole language first, requires either that they are built
in or that the teacher sets up a special environment for their
students.  For the latter, a PYTHONSTARTUP variable pointing to a file
with teachers' additions does nicely, but requires a level of control
over the student's environment that's not always realistic.
(Especially not when the student is teaching herself. :-)

Perhaps a special module of teacher's helpers could be devised, and a
special Python invocation to include that automatically?

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Nov 29 13:18:58 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 13:19:09 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Sat, 29 Nov 2003 03:26:38 EST."
	<001801c3b652$8534a820$e841fea9@oemcomputer> 
References: <001801c3b652$8534a820$e841fea9@oemcomputer> 
Message-ID: <200311291818.hATIIwo18695@c-24-5-183-134.client.comcast.net>

Way to go, Raymond.

One suggestion: instead of ilen(), I would suggest count().  (Yes,
I've been using more SQL lately. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From nas-python at python.ca  Sat Nov 29 14:52:35 2003
From: nas-python at python.ca (Neil Schemenauer)
Date: Sat Nov 29 14:45:53 2003
Subject: [Python-Dev] genexps  Was: "groupby" iterator
In-Reply-To: <000201c3b642$75b07060$e841fea9@oemcomputer>
References: <200311290550.hAT5ouK17896@c-24-5-183-134.client.comcast.net>
	<000201c3b642$75b07060$e841fea9@oemcomputer>
Message-ID: <20031129195235.GA695@mems-exchange.org>

On Sat, Nov 29, 2003 at 01:31:45AM -0500, Raymond Hettinger wrote:
> Unless someone in the know volunteers, it will need to wait until
> Christmas vacation.  Currently, the implementation is beyond my skill
> level.  It will take a while raise my skills to cover adding new syntax
> and what to do in the compiler.

I wonder if we should try to finish the new compiler first.

  Neil

From eppstein at ics.uci.edu  Sat Nov 29 15:14:14 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Sat Nov 29 15:14:17 2003
Subject: [Python-Dev] Re: "groupby" iterator
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<000101c3b63f$c7fc4720$e841fea9@oemcomputer>
Message-ID: <eppstein-71383C.12141329112003@sea.gmane.org>

In article <000101c3b63f$c7fc4720$e841fea9@oemcomputer>,
 "Raymond Hettinger" <python@rcn.com> wrote:

> Here is an implementation that translates readily into C.  It uses
> Guido's syntax and meets my requirement that bad things don't happen
> when someone runs the outer iterator independently of the inner
> iterator.

If I understand your code correctly, running the outer iterator skips 
over any uniterated values from the inner iterator.  I'd be happier with 
behavior like tee: the inner groups always return the same sequences of 
items, whether or not the inner iteration happens before the next outer 
iteration, but the memory cost is only small if you iterate through them 
in the expected order.  E.g., see the "out of order" unit test in the 
code below.

def identity(x): return x

def groupby(iterable,key=identity):
    it = iter(iterable)
    first = it.next()
    while 1:
        group = bygroup(it,first,key)
        yield key(first),group
        first = group.nextgroup()

class bygroup:
    """Iterator of items in a single group."""
    
    def __init__(self, iterable, first, key=identity):
        """Instance variables:
            - self.lookahead: reversed list of items still to be output
            - self.groupid: group identity
            - self.key: func to turn iterated items into group ids
            - self.it: iterator, or None once we reach another group
            - self.postfinal: None (only valid once self.it is None)
        """
        self.key = key
        self.it = iter(iterable)
        self.lookahead = [first]
        self.groupid = self.key(first)

    def __iter__(self):
        return self

    def group(self):
        return self.groupid

    def next(self):
        if self.lookahead:
            return self.lookahead.pop()
        if self.it is None:
            raise StopIteration
        x = self.it.next()
        if self.key(x) == self.groupid:
            return x
        self.postfinal = x
        self.it = None
        raise StopIteration

    def nextgroup(self):
        """Return first item of next group.
        Raises StopIteration if there is no next group."""
        if self.it is not None:
            L = list(self)
            L.reverse()
            self.lookahead = L
        if self.it is not None:
            raise StopIteration
        return self.postfinal

import unittest
from sets import Set as set

class TestBasicOps(unittest.TestCase):

    def test_groupby(self):
        # Check zero length input
        self.assertEqual([], list(groupby([],lambda r:r[0])))

        # Check normal input
        s = [(0, 10, 20), (0, 11,21), (0,12,21), (1,13,21), (1,14,22),
             (2,15,22), (3,16,23), (3,17,23)]
        dup = []
        for k, g in groupby(s, lambda r:r[0]):
            for elem in g:
                self.assertEqual(k, elem[0])
                dup.append(elem)
        self.assertEqual(s, dup)
        
        # Check case where groups are iterated out of order
        nest1 = []
        for k,g in groupby(s, lambda r:r[0]):
            nest1.append(list(g))
        nest2 = []
        for k,g in groupby(s, lambda r:r[0]):
            nest2.append(g)
        nest2 = [list(g) for g in nest2]
        self.assertEqual(nest1,nest2)

        # Check nested case
        dup = []
        for k, g in groupby(s, lambda r:r[0]):
            for ik, ig in groupby(g, lambda r:r[2]):      
                for elem in ig:
                    self.assertEqual(k, elem[0])
                    self.assertEqual(ik, elem[2])               
                    dup.append(elem)
        self.assertEqual(s, dup)      

        # Check case where inner iterator is not used
        keys = []
        for k, g in groupby(s, lambda r:r[0]):
            keys.append(k)
        expectedkeys = set([r[0] for r in s])
        self.assertEqual(set(keys), expectedkeys)
        self.assertEqual(len(keys), len(expectedkeys))

suite = unittest.TestSuite()
suite.addTest(unittest.makeSuite(TestBasicOps))
unittest.TextTestRunner(verbosity=2).run(suite)

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From perky at i18n.org  Sat Nov 29 17:32:20 2003
From: perky at i18n.org (Hye-Shik Chang)
Date: Sat Nov 29 17:32:30 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <000101c3b63f$c7fc4720$e841fea9@oemcomputer>
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<000101c3b63f$c7fc4720$e841fea9@oemcomputer>
Message-ID: <20031129223220.GA90372@i18n.org>

On Sat, Nov 29, 2003 at 01:12:34AM -0500, Raymond Hettinger wrote:
> [Guido]
> > I would make one change: after looking at another use case, I'd like
> > to change the outer iterator to produce (key, grouper) tuples.  This
> > way, you can write things like
> > 
> >   totals = {}
> >   for key, group in groupby(sequence):
> >       totals[key] = sum(group)

Heh. I love that!

> 
> Here is an implementation that translates readily into C.  It uses
> Guido's syntax and meets my requirement that bad things don't happen
> when someone runs the outer iterator independently of the inner
> iterator.
> 

I updated my implementation according to your guideline. Please
see attachments. Docstrings are still insufficient due to my
english shortage. :)

Thanks!


Regards,
  Hye-Shik
-------------- next part --------------
Index: Modules/itertoolsmodule.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/itertoolsmodule.c,v
retrieving revision 1.26
diff -u -u -r1.26 itertoolsmodule.c
--- Modules/itertoolsmodule.c	12 Nov 2003 14:32:26 -0000	1.26
+++ Modules/itertoolsmodule.c	29 Nov 2003 22:25:18 -0000
@@ -2081,6 +2081,332 @@
 };
 
 
+/* groupby object ***********************************************************/
+
+typedef struct {
+	PyObject_HEAD
+	PyObject *it;
+	PyObject *keyfunc;
+	PyObject *tgtkey;
+	PyObject *currkey;
+	PyObject *currvalue;
+} groupbyobject;
+
+static PyTypeObject groupby_type;
+static PyObject *_grouper_create(groupbyobject *, PyObject *);
+
+static PyObject *
+groupby_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
+{
+	groupbyobject *gbo;
+	PyObject *it, *keyfunc;
+
+	if (!PyArg_UnpackTuple(args, "groupby", 2, 2, &keyfunc, &it))
+		return NULL;
+
+	if (keyfunc != Py_None && !PyCallable_Check(keyfunc)) {
+		PyErr_SetString(PyExc_ValueError,
+		   "Key argument must be a callable object or None.");
+		return NULL;
+	}
+
+	gbo = (groupbyobject *)type->tp_alloc(type, 0);
+	if (gbo == NULL)
+		return NULL;
+	gbo->tgtkey = NULL;
+	gbo->currkey = NULL;
+	gbo->currvalue = NULL;
+	gbo->keyfunc = keyfunc;
+	Py_INCREF(keyfunc);
+	gbo->it = PyObject_GetIter(it);
+	if (gbo->it == NULL) {
+		Py_DECREF(gbo);
+		return NULL;
+	}
+	return (PyObject *)gbo;
+}
+
+static void
+groupby_dealloc(groupbyobject *gbo)
+{
+	PyObject_GC_UnTrack(gbo);
+	Py_XDECREF(gbo->it);
+	Py_XDECREF(gbo->keyfunc);
+	Py_XDECREF(gbo->tgtkey);
+	Py_XDECREF(gbo->currkey);
+	Py_XDECREF(gbo->currvalue);
+	gbo->ob_type->tp_free(gbo);
+}
+
+static int
+groupby_traverse(groupbyobject *gbo, visitproc visit, void *arg)
+{
+	int err;
+
+	if (gbo->it) {
+		err = visit(gbo->it, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->keyfunc) {
+		err = visit(gbo->keyfunc, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->tgtkey) {
+		err = visit(gbo->tgtkey, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->currkey) {
+		err = visit(gbo->currkey, arg);
+		if (err)
+			return err;
+	}
+
+	if (gbo->currvalue) {
+		err = visit(gbo->currvalue, arg);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static PyObject *
+groupby_next(groupbyobject *gbo)
+{
+	PyObject *newvalue, *newkey, *r, *grouper;
+	int rcmp;
+
+	/* skip to next iteration group */
+	for (;;) {
+		if (gbo->currkey == NULL)
+			rcmp = 0;
+		else if (gbo->tgtkey == NULL)
+			break;
+		else if (PyObject_Cmp(gbo->tgtkey, gbo->currkey, &rcmp) == -1)
+			return NULL;
+
+		if (rcmp != 0)
+			break;
+
+		newvalue = PyIter_Next(gbo->it);
+		if (newvalue == NULL)
+			return NULL;
+
+		if (gbo->keyfunc == Py_None) {
+			newkey = newvalue;
+			Py_INCREF(newvalue);
+		} else {
+			newkey = PyObject_CallFunctionObjArgs(gbo->keyfunc,
+							      newvalue, NULL);
+			if (newkey == NULL) {
+				Py_DECREF(newvalue);
+				return NULL;
+			}
+		}
+
+		Py_XDECREF(gbo->currkey);
+		gbo->currkey = newkey;
+		Py_XDECREF(gbo->currvalue);
+		gbo->currvalue = newvalue;
+	}
+
+	Py_XDECREF(gbo->tgtkey);
+	gbo->tgtkey = gbo->currkey;
+	Py_INCREF(gbo->currkey);
+
+	grouper = _grouper_create(gbo, gbo->tgtkey);
+	if (grouper == NULL)
+		return NULL;
+
+	r = PyTuple_New(2);
+	if (r == NULL)
+		return NULL;
+	PyTuple_SET_ITEM(r, 0, gbo->tgtkey);
+	Py_INCREF(gbo->tgtkey);
+	PyTuple_SET_ITEM(r, 1, grouper);
+
+	return r;
+}
+
+PyDoc_STRVAR(groupby_doc,
+"groupby(keyfunc, iterable) -> create an iterator which returns\n\
+(key, sub-iterator) grouped by each value of key(value).\n");
+
+static PyTypeObject groupby_type = {
+	PyObject_HEAD_INIT(NULL)
+	0,				/* ob_size */
+	"itertools.groupby",		/* tp_name */
+	sizeof(groupbyobject),		/* tp_basicsize */
+	0,				/* tp_itemsize */
+	/* methods */
+	(destructor)groupby_dealloc,	/* tp_dealloc */
+	0,				/* tp_print */
+	0,				/* tp_getattr */
+	0,				/* tp_setattr */
+	0,				/* tp_compare */
+	0,				/* tp_repr */
+	0,				/* tp_as_number */
+	0,				/* tp_as_sequence */
+	0,				/* tp_as_mapping */
+	0,				/* tp_hash */
+	0,				/* tp_call */
+	0,				/* tp_str */
+	PyObject_GenericGetAttr,	/* tp_getattro */
+	0,				/* tp_setattro */
+	0,				/* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
+		Py_TPFLAGS_BASETYPE,	/* tp_flags */
+	groupby_doc,			/* tp_doc */
+	(traverseproc)groupby_traverse,	/* tp_traverse */
+	0,				/* tp_clear */
+	0,				/* tp_richcompare */
+	0,				/* tp_weaklistoffset */
+	PyObject_SelfIter,		/* tp_iter */
+	(iternextfunc)groupby_next,	/* tp_iternext */
+	0,				/* tp_methods */
+	0,				/* tp_members */
+	0,				/* tp_getset */
+	0,				/* tp_base */
+	0,				/* tp_dict */
+	0,				/* tp_descr_get */
+	0,				/* tp_descr_set */
+	0,				/* tp_dictoffset */
+	0,				/* tp_init */
+	0,				/* tp_alloc */
+	groupby_new,			/* tp_new */
+	PyObject_GC_Del,		/* tp_free */
+};
+
+
+/* _grouper object (internal) ************************************************/
+
+typedef struct {
+	PyObject_HEAD
+	PyObject *parent;
+	PyObject *tgtkey;
+} _grouperobject;
+
+static PyTypeObject _grouper_type;
+
+static PyObject *
+_grouper_create(groupbyobject *parent, PyObject *tgtkey)
+{
+	_grouperobject *igo;
+
+	igo = PyObject_New(_grouperobject, &_grouper_type);
+	if (igo == NULL)
+		return PyErr_NoMemory();
+	igo->parent = (PyObject *)parent;
+	Py_INCREF(parent);
+	igo->tgtkey = tgtkey;
+	Py_INCREF(tgtkey);
+
+	return (PyObject *)igo;
+}
+
+static void
+_grouper_dealloc(_grouperobject *igo)
+{
+	Py_DECREF(igo->parent);
+	Py_DECREF(igo->tgtkey);
+	PyObject_Del(igo);
+}
+
+static PyObject *
+_grouper_next(_grouperobject *igo)
+{
+	groupbyobject *gbo = (groupbyobject *)igo->parent;
+	PyObject *newvalue, *newkey, *r;
+	int rcmp;
+
+	if (gbo->currvalue == NULL) {
+		newvalue = PyIter_Next(gbo->it);
+		if (newvalue == NULL)
+			return NULL;
+
+		if (gbo->keyfunc == Py_None) {
+			newkey = newvalue;
+			Py_INCREF(newvalue);
+		} else {
+			newkey = PyObject_CallFunctionObjArgs(gbo->keyfunc,
+							      newvalue, NULL);
+			if (newkey == NULL) {
+				Py_DECREF(newvalue);
+				return NULL;
+			}
+		}
+
+		assert(gbo->currkey == NULL);
+		gbo->currkey = newkey;
+		gbo->currvalue = newvalue;
+	}
+
+	assert(gbo->currkey != NULL);
+	if (PyObject_Cmp(igo->tgtkey, gbo->currkey, &rcmp) == -1)
+		return NULL;
+
+	if (rcmp != 0)
+		return NULL;
+
+	r = gbo->currvalue;
+	gbo->currvalue = NULL;
+	Py_DECREF(gbo->currkey);
+	gbo->currkey = NULL;
+
+	return r;
+}
+
+static PyTypeObject _grouper_type = {
+	PyObject_HEAD_INIT(NULL)
+	0,				/* ob_size */
+	"itertools._grouper",		/* tp_name */
+	sizeof(_grouperobject),		/* tp_basicsize */
+	0,				/* tp_itemsize */
+	/* methods */
+	(destructor)_grouper_dealloc,	/* tp_dealloc */
+	0,				/* tp_print */
+	0,				/* tp_getattr */
+	0,				/* tp_setattr */
+	0,				/* tp_compare */
+	0,				/* tp_repr */
+	0,				/* tp_as_number */
+	0,				/* tp_as_sequence */
+	0,				/* tp_as_mapping */
+	0,				/* tp_hash */
+	0,				/* tp_call */
+	0,				/* tp_str */
+	PyObject_GenericGetAttr,	/* tp_getattro */
+	0,				/* tp_setattro */
+	0,				/* tp_as_buffer */
+	Py_TPFLAGS_DEFAULT,		/* tp_flags */
+	0,				/* tp_doc */
+	0, 				/* tp_traverse */
+	0,				/* tp_clear */
+	0,				/* tp_richcompare */
+	0,				/* tp_weaklistoffset */
+	PyObject_SelfIter,		/* tp_iter */
+	(iternextfunc)_grouper_next,	/* tp_iternext */
+	0,				/* tp_methods */
+	0,				/* tp_members */
+	0,				/* tp_getset */
+	0,				/* tp_base */
+	0,				/* tp_dict */
+	0,				/* tp_descr_get */
+	0,				/* tp_descr_set */
+	0,				/* tp_dictoffset */
+	0,				/* tp_init */
+	0,				/* tp_alloc */
+	0,				/* tp_new */
+	_PyObject_Del,			/* tp_free */
+};
+
+
 /* module level code ********************************************************/
 
 PyDoc_STRVAR(module_doc,
@@ -2103,6 +2429,7 @@
 chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ... \n\
 takewhile(pred, seq) --> seq[0], seq[1], until pred fails\n\
 dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails\n\
+groupby(keyfunc, iterable) --> sub-iteraters grouped by value of keyfunc(v)\n\
 ");
 
 
@@ -2130,6 +2457,7 @@
 		&count_type,
 		&izip_type,
 		&repeat_type,
+		&groupby_type,
 		NULL
 	};
 
@@ -2148,5 +2476,6 @@
 		return;
 	if (PyType_Ready(&tee_type) < 0)
 		return;
-
+	if (PyType_Ready(&_grouper_type) < 0)
+		return;
 }
-------------- next part --------------
import unittest
from itertools import groupby

class TestBasicOps(unittest.TestCase):

    def test_groupby(self):
        # Check zero length input
        self.assertEqual([], list(groupby(lambda r:r[0], [])))

        # Check normal input
        s = [(0, 10, 20), (0, 11,21), (0,12,21), (1,13,21), (1,14,22),
             (2,15,22), (3,16,23), (3,17,23)]
        dup = []
        for k, g in groupby(lambda r:r[0], s):
            for elem in g:
                self.assertEqual(k, elem[0])
                dup.append(elem)
        self.assertEqual(s, dup)

        # Check nested case
        dup = []
        for k, g in groupby(lambda r:r[0], s):
            for ik, ig in groupby(lambda r:r[2], g):
                for elem in ig:
                    self.assertEqual(k, elem[0])
                    self.assertEqual(ik, elem[2])
                    dup.append(elem)
        self.assertEqual(s, dup)

        # Check case where inner iterator is not used
        keys = [k for k, g in groupby(lambda r:r[0], s)]
        expectedkeys = set([r[0] for r in s])
        self.assertEqual(set(keys), expectedkeys)
        self.assertEqual(len(keys), len(expectedkeys))

        # Check case where key is None
        word = 'abracadabra'
        keys = [k for k, g in groupby(None, list.sorted(word))]
        expectedkeys = set(word)
        self.assertEqual(set(keys), expectedkeys)
        self.assertEqual(len(keys), len(expectedkeys))

        # Exercise pipes and filters style
        s = 'abracadabra'
        ilen = lambda it: len(list(it))
        # sort s | uniq
        r = [k for k, g in groupby(None, list.sorted(s))]
        self.assertEqual(r, ['a', 'b', 'c', 'd', 'r'])
        # sort s | uniq -d
        r = [k for k, g in groupby(None, list.sorted(s)) if ilen(g)>1]
        self.assertEqual(r, ['a', 'b', 'r'])
        # sort s | uniq -c
        r = [(ilen(g), k) for k, g in groupby(None, list.sorted(s))]
        self.assertEqual(r, [(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')])
        # sort s | uniq -c | sort -rn | head -3
        r = list.sorted([(ilen(g), k) for k, g in groupby(None, list.sorted(s))], reverse=True)[:3]
        self.assertEqual(r, [(5, 'a'), (2, 'r'), (2, 'b')])

        # Uniteratable argument
        self.assertRaises(TypeError, groupby, None, None)

        # iter.next failure
        class ExpectedError(Exception):
            pass
        def delayed_raise(n=0):
            for i in range(n):
                yield 'yo'
            raise ExpectedError
        def gulp(key, iterable, func=list):
            return [func(g) for k, g in groupby(key, iterable)]

        # iter.next failure on outer object
        self.assertRaises(ExpectedError, gulp, None, delayed_raise(0))
        # iter.next failure on inner object
        self.assertRaises(ExpectedError, gulp, None, delayed_raise(1))

        # __cmp__ failure
        class DummyCmp:
            def __cmp__(self, dst):
                raise ExpectedError
        s = [DummyCmp(), DummyCmp(), None]

        # __cmp__ failure on outer object
        self.assertRaises(ExpectedError, gulp, None, s, id)
        # __cmp__ failure on inner object
        self.assertRaises(ExpectedError, gulp, None, s)

        # keyfunc failure
        def keyfunc(obj):
            if keyfunc.skip > 0:
                keyfunc.skip -= 1
                return obj
            else:
                raise ExpectedError

        # keyfunc failure on outer object
        keyfunc.skip = 0
        self.assertRaises(ExpectedError, gulp, keyfunc, [None])
        keyfunc.skip = 1
        self.assertRaises(ExpectedError, gulp, keyfunc, [None, None])


suite = unittest.TestSuite()
suite.addTest(unittest.makeSuite(TestBasicOps))
unittest.TextTestRunner(verbosity=2).run(suite)
From tjreedy at udel.edu  Sat Nov 29 18:14:04 2003
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat Nov 29 18:14:09 2003
Subject: [Python-Dev] Re: Telnet server
References: <001a01c3b64d$07660790$1501a8c0@datec21>
Message-ID: <bqb97s$uiq$1@sea.gmane.org>

cc'ed
"
I want to develop a thin telnet server using the curses library .First
of all is this possible ? The box on which I want to develop this,
does not provide a shell as it runs customized embedded linux so i
want to write a telnet server on the box  which presents the telnet
client,  a curses kinda intrface after he gets past the login prompt
authentication. I can execute my python scripts on this box without
any problems . Any pointers ?
"

1. Post plain text instead of html.

2. Ask usage questions (like the above) on the main python list or
comp.lang.python.  Py-dev is for discussion of future-release
development issues.

TJR


From guido at python.org  Sat Nov 29 19:06:21 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 19:06:31 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: Your message of "Tue, 25 Nov 2003 21:32:58 +0100."
	<20031125203258.GA29814@i92.ryd.student.liu.se> 
References: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net>  
	<20031125203258.GA29814@i92.ryd.student.liu.se> 
Message-ID: <200311300006.hAU06Lp19846@c-24-5-183-134.client.comcast.net>

> [Guido van Rossum]
> > There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that
> > promise they will disappear in Python 2.4.  If anyone has time to
> > fix these, I'd appreciate it.  (It's not just a matter of removing
> > the FutureWarnings -- you actually have to implement the promised
> > future behavior. :-) I may get to these myself, but they're not
> > exactly rocket science, so they might be a good thing for a
> > beginning developer (use SF please if you'd like someone to review
> > the changes first).

[Kalle Svensson]
> I've submitted a patch (http://python.org/sf/849227).  And yes,
> somebody should probably take a good look at it before applying.  The
> (modified) test suite does pass on my machine, but that's all.  I may
> well have forgotten to add tests for new special cases, and I'm not
> the most experienced C programmer on the block either.

Well, it looks like you got everything right.  Congratulations!  I've
checked your code into CVS.

There are now two pieces of PEP 237 unimplemented (apart from the
complete and total eradication of long literals, which won't happen
until 3.0).

(1) PEP 237 promises that after the new semantics are introduced for
    hex/oct literals and conversions, and left shifts, operations that
    cause a different result than before will produce a warning that
    is on by default.  Given the pain we've suffered through the
    warnings in 2.3 about this stuff, I propose to forget about these
    warnings.  The new semantics are clear and consistent, warnings
    would just cause more distress, and code first ported to 2.3 will
    already have silenced the warnings.

(2) PEP 237 promises that repr() of a long should no longer show a
    trailing 'L'.  This is not yet implemented (i.e., repr() of a long
    still has a trailing 'L').  First, past experience suggests that
    quite a bit of end user code will break, and it may easily break
    silently: there used to be code that did str(x)[:-1] (knowing x
    was a long) to strip the 'L', which broke when str() of a long no
    longer returned a trailing 'L'.  Apparently some of this code was
    "fixed" by changing str() into repr(), and this code will now
    break again.  Second, I *like* seeing a trailing L on longs,
    especially when there's no reason for it to be a long: if some
    expression returns 1L, I know something fishy may have gone on.

Any comments on these?  Should I update PEP 237 to reflect this?

> As a side note, I think that line 233 in Lib/test/test_format.py
> 
>   if sys.maxint == 2**32-1:
> 
> should be
> 
>   if sys.maxint == 2**31-1:
> 
> but I didn't include that in the patch or submit a bug report.
> Should I?

Fixed that too.  But somebody might want to backport it to 2.3.3.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Sat Nov 29 19:38:25 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 29 19:39:05 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: <200311300006.hAU06Lp19846@c-24-5-183-134.client.comcast.net>
Message-ID: <001601c3b6da$44065fa0$e841fea9@oemcomputer>

> (1) PEP 237 promises that after the new semantics are introduced for
>     hex/oct literals and conversions, and left shifts, operations that
>     cause a different result than before will produce a warning that
>     is on by default.  Given the pain we've suffered through the
>     warnings in 2.3 about this stuff, I propose to forget about these
>     warnings.  The new semantics are clear and consistent, warnings
>     would just cause more distress, and code first ported to 2.3 will
>     already have silenced the warnings.

+1, The warnings cause more pain than they save.  Part of the purpose of
a warning is to leave you feeling unsettled -- I don't think that is a
worthy goal when the code is going to work fine anyway.

Let PyChecker or some such warn about prior version compatibility issues
like that.


> (2) PEP 237 promises that repr() of a long should no longer show a
>     trailing 'L'.  This is not yet implemented (i.e., repr() of a long
>     still has a trailing 'L').  First, past experience suggests that
>     quite a bit of end user code will break, and it may easily break
>     silently: there used to be code that did str(x)[:-1] (knowing x
>     was a long) to strip the 'L', which broke when str() of a long no
>     longer returned a trailing 'L'.  Apparently some of this code was
>     "fixed" by changing str() into repr(), and this code will now
>     break again.  Second, I *like* seeing a trailing L on longs,
>     especially when there's no reason for it to be a long: if some
>     expression returns 1L, I know something fishy may have gone on.

-0, The reasons are good but this one has been promised for several
years.  It's time for an L free python -- one less thing to have to
learn.

If there is transition difficultly, let it be a prompt to consider
applying the forthcoming Decimal module.

If necessary, we could add a debug mode switch for L's to be on or off.
By putting it the debug build, we keep people from using it in
production code.  The purpose is to allow code to be run twice to see if
different results are obtained.

Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep
for indicators like [:-1] on the same line as long() or repr().


> Should I update PEP 237 to reflect this?

Yes, that's better than surprising people later.


Raymond


From guido at python.org  Sat Nov 29 19:56:58 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 19:57:03 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: Your message of "Sat, 29 Nov 2003 19:38:25 EST."
	<001601c3b6da$44065fa0$e841fea9@oemcomputer> 
References: <001601c3b6da$44065fa0$e841fea9@oemcomputer> 
Message-ID: <200311300056.hAU0uwA19969@c-24-5-183-134.client.comcast.net>

> > (2) PEP 237 promises that repr() of a long should no longer show a
> >     trailing 'L'.  This is not yet implemented (i.e., repr() of a long
> >     still has a trailing 'L').  First, past experience suggests that
> >     quite a bit of end user code will break, and it may easily break
> >     silently: there used to be code that did str(x)[:-1] (knowing x
> >     was a long) to strip the 'L', which broke when str() of a long no
> >     longer returned a trailing 'L'.  Apparently some of this code was
> >     "fixed" by changing str() into repr(), and this code will now
> >     break again.  Second, I *like* seeing a trailing L on longs,
> >     especially when there's no reason for it to be a long: if some
> >     expression returns 1L, I know something fishy may have gone on.
> 
> -0, The reasons are good but this one has been promised for several
> years.  It's time for an L free python -- one less thing to have to
> learn.

Yes, but people using type() or isinstance() or __class__ will still
have to remember that there are two types of integers: int and long.
And both built-ins will be with us for years, and they aren't quite
aliases for each other (long('12') returns a long, but int('12') an
int).

> If there is transition difficultly, let it be a prompt to consider
> applying the forthcoming Decimal module.

This I don't understand.

> If necessary, we could add a debug mode switch for L's to be on or off.
> By putting it the debug build, we keep people from using it in
> production code.  The purpose is to allow code to be run twice to see if
> different results are obtained.

But making a debug build is far from trivial (especially on Windows).
Perhaps it should be a switch on the regular build but also produce a
warning, to annoy. :-)

> Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep
> for indicators like [:-1] on the same line as long() or repr().

Can you take care of that?

> > Should I update PEP 237 to reflect this?
> 
> Yes, that's better than surprising people later.

I'll do that (in due time).

--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Sat Nov 29 20:09:25 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sat Nov 29 20:10:05 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: <200311300056.hAU0uwA19969@c-24-5-183-134.client.comcast.net>
Message-ID: <001e01c3b6de$98bca280$e841fea9@oemcomputer>

> > If necessary, we could add a debug mode switch for L's to be on or
off.
> > By putting it the debug build, we keep people from using it in
> > production code.  The purpose is to allow code to be run twice to
see if
> > different results are obtained.
> 
> But making a debug build is far from trivial (especially on Windows).
> Perhaps it should be a switch on the regular build but also produce a
> warning, to annoy. :-)

That would work.


> > Also, we can put migration advice in PEP 290 and whatsnew24.tex to
grep
> > for indicators like [:-1] on the same line as long() or repr().
> 
> Can you take care of that?

Yes, when the time comes.


Raymond


From anthony at ekit-inc.com  Sat Nov 29 20:27:59 2003
From: anthony at ekit-inc.com (Anthony Baxter)
Date: Sat Nov 29 20:28:20 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() 
In-Reply-To: Message from Guido van Rossum <guido@python.org> of "Sat,
	29 Nov 2003 10:10:58 -0800."
	<200311291810.hATIAwc18636@c-24-5-183-134.client.comcast.net> 
Message-ID: <200311300128.hAU1S0cE031343@maxim.off.ekorp.com>


>>> Guido van Rossum wrote
> Yes, backticks will be gone in 3.0.  But I expect there's no hope of
> getting rid of them earlier -- they've been used too much.  I suspect

Then let's kill all use of backticks in the standard library. There's
a lot of them.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never to late to have a happy childhood.


From kajiyama at grad.sccs.chukyo-u.ac.jp  Sat Nov 29 20:24:01 2003
From: kajiyama at grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA)
Date: Sat Nov 29 20:36:34 2003
Subject: [Python-Dev] possible backward incompatibility in test.regrtest
Message-ID: <200311300124.hAU1O1N21082@grad.sccs.chukyo-u.ac.jp>

Hi developers,

It seems that the test.regrtest module has a possible backward
incompatibility with regard to pre-Python 2.3 releases.  I have
a test suit implemented using the test.regrtest module.  In this
test suit, my own tests are invoked by a script like this:

  import os
  from test import regrtest

  regrtest.STDTESTS = []
  regrtest.main(testdir=os.getcwd())

This script runs fine with 2.2 but does not with 2.3, since
regrtest.py in Python 2.3 has the following lines in runtest()
(introduced in Revision 1.87.2.1.  See [1]):

  if test.startswith('test.'):
      abstest = test
  else:
      # Always import it from the test package
      abstest = 'test.' + test
  the_package = __import__(abstest, globals(), locals(), [])

That is, tests must be in a package named "test".  However, this
package name is already used by the standard library, and AFAIK
multiple packages with the same package name cannot exist.  In
other words, any additional tests (i.e. my own tests) have to be
put into the test package in the standard library.  Otherwise,
the additional tests won't be found.  IMHO, this change in 2.3
is not reasonable.

Unless I miss something trivial (I hope so), I'd have to give up
using the test.regrtest module.  I appreciate any comment.

Thanks,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>

[1] http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Lib/test/regrtest.py?r1=1.87&r2=1.87.2.1

From tim.one at comcast.net  Sat Nov 29 22:24:21 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 29 22:24:25 2003
Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs
In-Reply-To: <200311300006.hAU06Lp19846@c-24-5-183-134.client.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHGHHAB.tim.one@comcast.net>

[Guido]
> ...
> (1) PEP 237 promises that after the new semantics are introduced for
>     hex/oct literals and conversions, and left shifts, operations that
>     cause a different result than before will produce a warning that
>     is on by default.  Given the pain we've suffered through the
>     warnings in 2.3 about this stuff, I propose to forget about these
>     warnings.  The new semantics are clear and consistent, warnings
>     would just cause more distress, and code first ported to 2.3 will
>     already have silenced the warnings.

+1, and especially since it looks like 2.3 is going to become the next 1.5.2
(i.e., the version everyone flocks to, and then badgers you about for the
next 20 years <wink>).

> (2) PEP 237 promises that repr() of a long should no longer show a
>     trailing 'L'.  This is not yet implemented (i.e., repr() of a long
>     still has a trailing 'L').  First, past experience suggests that
>     quite a bit of end user code will break, and it may easily break
>     silently: there used to be code that did str(x)[:-1] (knowing x
>     was a long) to strip the 'L', which broke when str() of a long no
>     longer returned a trailing 'L'.  Apparently some of this code was
>     "fixed" by changing str() into repr(), and this code will now
>     break again.  Second, I *like* seeing a trailing L on longs,
>     especially when there's no reason for it to be a long: if some
>     expression returns 1L, I know something fishy may have gone on.

+1.  Changing string representations is always traumatic (lots of programs
rely on parsing them), and I have a hard time imagining what positive good
could come from stripping the 'L'.  Making that change for str(long) seemed
like pure loss from my POV (broke stuff and helped nothing).

> Any comments on these?  Should I update PEP 237 to reflect this?

The PEP should reflect The Plan, sure.


From tim.one at comcast.net  Sat Nov 29 22:31:46 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Nov 29 22:31:49 2003
Subject: [Python-Dev] genexps  Was: "groupby" iterator
In-Reply-To: <20031129195235.GA695@mems-exchange.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHHHHAB.tim.one@comcast.net>

[Raymond Hettinger]
>> Unless someone in the know volunteers, it will need to wait until
>> Christmas vacation.  Currently, the implementation is beyond my skill
>> level.  It will take a while raise my skills to cover adding new
>> syntax and what to do in the compiler.

[Neil Schemenauer]
> I wonder if we should try to finish the new compiler first.

That's the rub -- if I have time to move 2.4 along, I'll first give it to
advancing the AST branch.  Teaching the current front end new parsing tricks
would be an exercise is obsolescence.


From guido at python.org  Sat Nov 29 23:26:59 2003
From: guido at python.org (Guido van Rossum)
Date: Sat Nov 29 23:27:44 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: Your message of "Sun, 30 Nov 2003 12:27:59 +1100."
	<200311300128.hAU1S0cE031343@maxim.off.ekorp.com> 
References: <200311300128.hAU1S0cE031343@maxim.off.ekorp.com> 
Message-ID: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net>

> Then let's kill all use of backticks in the standard library. There's
> a lot of them.

That's one reason why we have to support them for a long time; there
standard library has widely been used as sample code, so there's
likely to be a lot of them elsewhere.

As always, be careful with doing peephole changes to the standard
library -- historically, we've seen a 1-5% error rate in these change
sets that persists for months or years afterwards.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From oren-py-d at hishome.net  Sun Nov 30 02:31:09 2003
From: oren-py-d at hishome.net (Oren Tirosh)
Date: Sun Nov 30 02:31:12 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <002701c3b606$c61304a0$e841fea9@oemcomputer>
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<002701c3b606$c61304a0$e841fea9@oemcomputer>
Message-ID: <20031130073109.GA1560@hishome.net>

On Fri, Nov 28, 2003 at 06:24:30PM -0500, Raymond Hettinger wrote:
...
> students.sort(key=extract('grade'))  # key=lambda r:r.grade
> students.sort(key=extract(2))        # key=lambda r:[2]

Why should the extract function interpret a string argument as getattr
and an int argument as getitem? 

I find the explicit version more readable:

students.sort(key=attrgetter('grade'))  # key=lambda r:r.grade
students.sort(key=itemgetter(2))        # key=lambda r:[2]
students.sort(key=itemgetter('grade'))  # key=lambda r:r['grade']

    Oren

From python at rcn.com  Sun Nov 30 03:26:16 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sun Nov 30 03:26:56 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net>
Message-ID: <000201c3b71b$9facae40$e841fea9@oemcomputer>

 [Anthony] 
> > Then let's kill all use of backticks in the standard library.
There's
> > a lot of them.

[Guido]
> As always, be careful with doing peephole changes to the standard
> library -- historically, we've seen a 1-5% error rate in these change
> sets that persists for months or years afterwards.

FWIW, Walter and I did a bunch of these for Py2.3 and had excellent
success because of a good process.  Some ideas are:

* start it now (don't wait until a beta release).

* skip the packages like email which are maintained separately

* think out ways it could go wrong (operator precedence, double
backticks, escaped backticks, backticks inside strings or comments,
etc.).

* do it manually (not brainlessly), then do it with automation to
compare the results.

* make sure every affected module still imports.

* run the whole unittest suite in debug mode with -u all.

* self-review the diff file.

* get a second person to do a 100% review of the diff (Walter or I would
be a good choice).

* put on an asbestos suit because the flames will come even if no
mistakes are made.

IMO, this change is much easier to get right than the ones that were
done before.


Good luck,


Raymond


From skip at manatee.mojam.com  Sun Nov 30 08:01:04 2003
From: skip at manatee.mojam.com (Skip Montanaro)
Date: Sun Nov 30 08:06:15 2003
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200311301301.hAUD14wF006134@manatee.mojam.com>


Bug/Patch Summary
-----------------

590 open / 4387 total bugs (+63)
207 open / 2476 total patches (+28)

New Bugs
--------

XMLGenerator.startElementNS dies on EMPTY_NAMESPACE attribut (2003-11-23)
	http://python.org/sf/847665
Keyword similar to "global" for nested scopes wanted (2003-11-23)
	http://python.org/sf/847778
64 bit solaris versus /usr/local/lib (2003-11-23)
	http://python.org/sf/847812
4.2.6 (re) Examples: float regexp exponential on failure (2003-11-24)
	http://python.org/sf/848556
couple of new list.sort bugs (2003-11-25)
	http://python.org/sf/848856
Windows installer halts (2003-11-25)
	http://python.org/sf/848871
pydoc crash on MacOS X (2003-11-25)
	http://python.org/sf/848907
gzip.GzipFile is slow (2003-11-25)
	http://python.org/sf/849046
Request: getpos() for sgmllib (2003-11-25)
	http://python.org/sf/849097
ZipInfo shows incorrect file size for large files (2003-11-25)
	http://python.org/sf/849218
reading shelves is really slow (2003-11-26)
	http://python.org/sf/849662
unclear documentation/missing command? (2003-11-27)
	http://python.org/sf/850238
Typo in Popen3 description (2003-11-28)
	http://python.org/sf/850818
Doc/README has broken link (2003-11-28)
	http://python.org/sf/850823
optparse: OptionParser.__init__'s "prog" argument ignored (2003-11-28)
	http://python.org/sf/850964
test_poll fails in 2.3.2 on MacOSX(Panther) (2003-11-28)
	http://python.org/sf/850981
mbcs encoding ignores errors (2003-11-28)
	http://python.org/sf/850997
building on Fedora Core 1 (2003-11-28)
	http://python.org/sf/851020
winreg can segfault (2003-11-28)
	http://python.org/sf/851056
shutil.copy destroys hard links (2003-11-29)
	http://python.org/sf/851123
Item out of order on builtin function page (2003-11-29)
	http://python.org/sf/851152
Bug tracker page asks for login even when logged in (2003-11-29)
	http://python.org/sf/851156
New-style classes with __eq__ but not __hash__ are hashable (2003-11-29)
	http://python.org/sf/851449

New Patches
-----------

Port tests to unittest (Part 2) (2003-05-13)
	http://python.org/sf/736962
SimpleHTTPServer reports wrong content-length for text files (2003-11-10)
	http://python.org/sf/839496
Extend struct.unpack to produce nested tuples (2003-11-23)
	http://python.org/sf/847857
Cookie.py: One step closer to RFC 2109 (2003-11-23)
	http://python.org/sf/848017
Flakey urllib2.parse_http_list (2003-11-25)
	http://python.org/sf/848870
Small error in test_format (2003-11-25)
	http://python.org/sf/849252
832799 proposed changes (2003-11-25)
	http://python.org/sf/849262
improve embeddability of python (2003-11-25)
	http://python.org/sf/849278
urllib reporthook could be more informative (2003-11-25)
	http://python.org/sf/849407
Enhance frame handing in warnings.warn() (2003-11-27)
	http://python.org/sf/850482
Semaphore.acquire() timeout parameter (2003-11-28)
	http://python.org/sf/850728
call com_set_lineno more often (2003-11-28)
	http://python.org/sf/850789
Modify Setup.py to Detect Tcl/Tk on BSD (2003-11-28)
	http://python.org/sf/850977
Argument passing from /usr/bin/idle2.3 to idle.py (2003-11-29)
	http://python.org/sf/851459

Closed Bugs
-----------

Dialogs too tight on OSX (2002-10-29)
	http://python.org/sf/630818
MacPython for Panther additions includes IDLE (2003-11-08)
	http://python.org/sf/838616
SimpleHTTPServer reports wrong content-length for text files (2003-11-10)
	http://python.org/sf/839496
PackMan database for panther misses devtools dep (2003-11-14)
	http://python.org/sf/842116
PackageManager: deselect show hidden: indexerror (2003-11-18)
	http://python.org/sf/844676
error in python's grammar (2003-11-21)
	http://python.org/sf/846521
"and" operator tests the first argument twice (2003-11-21)
	http://python.org/sf/846564

Closed Patches
--------------


From neal at metaslash.com  Sun Nov 30 11:02:31 2003
From: neal at metaslash.com (Neal Norwitz)
Date: Sun Nov 30 11:02:37 2003
Subject: [Python-Dev] Use of Python Versions
Message-ID: <20031130160230.GO13300@epoch.metaslash.com>

I conducted an experiment to try to find out what versions of Python
people use.  In the last release of pychecker, I asked people to take
a survey (http://metaslash.com/pyversion.html).  While not scientific,
it provides some info.

There were 186 responses, with 2 apparent duplicates.  Nobody used
only one version of Python with that version being 2.1 or below.  
110 people only use a single version of python with 10 using 2.2 only, 
108 using 2.3 only, and 2 using 2.4 only.

Here are the total number of responses by version:
        1.5       5, all 5 also use 2.3
        2.0       3
        2.1      13
        2.2      72
        2.3     172
        2.4      23

The raw responses are here:  http://metaslash.com/pyver.txt

Neal

From guido at python.org  Sun Nov 30 11:50:41 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 30 11:50:48 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Sun, 30 Nov 2003 07:32:20 +0900."
	<20031129223220.GA90372@i18n.org> 
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<000101c3b63f$c7fc4720$e841fea9@oemcomputer> 
	<20031129223220.GA90372@i18n.org> 
Message-ID: <200311301650.hAUGofH28925@c-24-5-183-134.client.comcast.net>

I lost David Eppstein's post, but I finally know what I want to say in
response.  David objected to the behavior of the groupby()
subiterators to become invalidated when the outer iterator is moved on
to the next subiterator.  But I don't think there's a good use case
for what he wants to do instead: save enough state so that the
subiterators can be used in arbitrary order.  An application
that saves the subiterators for later will end up saving a copy of
everything, so it might as well be written so explicitly, e.g.:

  store = {}
  for key, group in groupby(keyfunc, iterable):
      store[key] = list(group)
  # now access the groups in random order:
  for key in store:
      print store[key]

I don't think the implementation should be complexified to allow
leaving out the explicit list() call in the first loop.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From oren-py-d at hishome.net  Sun Nov 30 15:44:59 2003
From: oren-py-d at hishome.net (Oren Tirosh)
Date: Sun Nov 30 15:45:02 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <000201c3b71b$9facae40$e841fea9@oemcomputer>
References: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net>
	<000201c3b71b$9facae40$e841fea9@oemcomputer>
Message-ID: <20031130204459.GA3275@hishome.net>

On Sun, Nov 30, 2003 at 03:26:16AM -0500, Raymond Hettinger wrote:
>  [Anthony] 
> > > Then let's kill all use of backticks in the standard library.
> There's
> > > a lot of them.
> 
> [Guido]
> > As always, be careful with doing peephole changes to the standard
> > library -- historically, we've seen a 1-5% error rate in these change
> > sets that persists for months or years afterwards.
> 
> FWIW, Walter and I did a bunch of these for Py2.3 and had excellent
> success because of a good process.  Some ideas are:
> 
> * start it now (don't wait until a beta release).
> 
> * skip the packages like email which are maintained separately
> 
> * think out ways it could go wrong (operator precedence, double
> backticks, escaped backticks, backticks inside strings or comments,
> etc.).
> 
> * do it manually (not brainlessly), then do it with automation to
> compare the results.

Here's an idea for verifying an automated translator:

Instead of converting `expr` to repr(expr) convert it first to (`expr`) 
or even (`(expr)`) and make sure it still compiles into exactly the same 
bytecode. It should catch all the problem you mention except backticks
in comments and strings. These need manual inspection.

   Oren

From aleaxit at yahoo.com  Sun Nov 30 15:57:49 2003
From: aleaxit at yahoo.com (Alex Martelli)
Date: Sun Nov 30 15:57:58 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <20031130073109.GA1560@hishome.net>
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<002701c3b606$c61304a0$e841fea9@oemcomputer>
	<20031130073109.GA1560@hishome.net>
Message-ID: <200311302157.49205.aleaxit@yahoo.com>

On Sunday 30 November 2003 08:31, Oren Tirosh wrote:
> On Fri, Nov 28, 2003 at 06:24:30PM -0500, Raymond Hettinger wrote:
> ...
>
> > students.sort(key=extract('grade'))  # key=lambda r:r.grade
> > students.sort(key=extract(2))        # key=lambda r:[2]
>
> Why should the extract function interpret a string argument as getattr
> and an int argument as getitem?
>
> I find the explicit version more readable:
>
> students.sort(key=attrgetter('grade'))  # key=lambda r:r.grade
> students.sort(key=itemgetter(2))        # key=lambda r:[2]
> students.sort(key=itemgetter('grade'))  # key=lambda r:r['grade']

I concur: "overloading" extract to mean (the equivalent of) either
getattr or getitem depending on the argument type doesn't look
good, besides making it unusable to extract some items from dicts.

Since these functions or types are going to be in operator, I think
we can afford to "spend" two names to distinguish functionality
(even though attgetter and itemgetter look nowhere as neat as
extract -- I don't have better suggestions offhand).


Alex


From guido at python.org  Sun Nov 30 16:54:23 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 30 16:54:29 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Sun, 30 Nov 2003 21:57:49 +0100."
	<200311302157.49205.aleaxit@yahoo.com> 
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<002701c3b606$c61304a0$e841fea9@oemcomputer>
	<20031130073109.GA1560@hishome.net> 
	<200311302157.49205.aleaxit@yahoo.com> 
Message-ID: <200311302154.hAULsN229214@c-24-5-183-134.client.comcast.net>

> I concur: "overloading" extract to mean (the equivalent of) either
> getattr or getitem depending on the argument type doesn't look
> good, besides making it unusable to extract some items from dicts.

Agreed.  I've seen too many of such "clever" overloading schemes in a
past life.

> Since these functions or types are going to be in operator, I think
> we can afford to "spend" two names to distinguish functionality
> (even though attgetter and itemgetter look nowhere as neat as
> extract -- I don't have better suggestions offhand).

Right.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Sun Nov 30 18:48:25 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Nov 30 18:46:38 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200311302157.49205.aleaxit@yahoo.com>
References: <20031130073109.GA1560@hishome.net>
	<200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<002701c3b606$c61304a0$e841fea9@oemcomputer>
	<20031130073109.GA1560@hishome.net>
Message-ID: <5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com>

At 09:57 PM 11/30/03 +0100, Alex Martelli wrote:
> > students.sort(key=attrgetter('grade'))  # key=lambda r:r.grade
> > students.sort(key=itemgetter(2))        # key=lambda r:[2]
> > students.sort(key=itemgetter('grade'))  # key=lambda r:r['grade']
>
>I concur: "overloading" extract to mean (the equivalent of) either
>getattr or getitem depending on the argument type doesn't look
>good, besides making it unusable to extract some items from dicts.
>
>Since these functions or types are going to be in operator, I think
>we can afford to "spend" two names to distinguish functionality
>(even though attgetter and itemgetter look nowhere as neat as
>extract -- I don't have better suggestions offhand).

How about:

extract(attr='grade')
extract(item=2)
extract(method='foo')   # returns the result of calling 'ob.foo()'

And following the pattern of Zope's old "query" package:

extract(extract(attr='foo'), attr='bar')   # extracts ob.foo.bar
extract(extract(item=10), method='spam')   # extracts ob[10].spam()

i.e., the first (optional) positional argument to extract is a function 
that's called on the outer extract's argument, and the return value is then 
used to perform the main extract operation on.

IIRC, the Zope query package used __getitem__ instead of __call__ on its 
instances as a speed hack, but I don't think we should follow that example.  :)


From guido at python.org  Sun Nov 30 19:18:37 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 30 19:18:48 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: Your message of "Sun, 30 Nov 2003 18:48:25 EST."
	<5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com> 
References: <20031130073109.GA1560@hishome.net>
	<200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<002701c3b606$c61304a0$e841fea9@oemcomputer>
	<20031130073109.GA1560@hishome.net> 
	<5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com> 
Message-ID: <200312010018.hB10IbS29532@c-24-5-183-134.client.comcast.net>

> How about:
> 
> extract(attr='grade')
> extract(item=2)
> extract(method='foo')   # returns the result of calling 'ob.foo()'
> 
> And following the pattern of Zope's old "query" package:
> 
> extract(extract(attr='foo'), attr='bar')   # extracts ob.foo.bar
> extract(extract(item=10), method='spam')   # extracts ob[10].spam()
> 
> i.e., the first (optional) positional argument to extract is a function 
> that's called on the outer extract's argument, and the return value is then 
> used to perform the main extract operation on.

I'm not sure what the advantage of this is.  It seems more typing,
more explanation, probably more code to implement (to check for
contradicting keyword args).

> IIRC, the Zope query package used __getitem__ instead of __call__ on its 
> instances as a speed hack, but I don't think we should follow that example.  :)

Right. :)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From fincher.8 at osu.edu  Sun Nov 30 20:25:46 2003
From: fincher.8 at osu.edu (Jeremy Fincher)
Date: Sun Nov 30 19:27:53 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: <20031130204459.GA3275@hishome.net>
References: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net>
	<000201c3b71b$9facae40$e841fea9@oemcomputer>
	<20031130204459.GA3275@hishome.net>
Message-ID: <200311302025.46673.fincher.8@osu.edu>

On Sunday 30 November 2003 03:44 pm, Oren Tirosh wrote:
> Instead of converting `expr` to repr(expr) convert it first to (`expr`)
> or even (`(expr)`) and make sure it still compiles into exactly the same
> bytecode. It should catch all the problem you mention except backticks
> in comments and strings. These need manual inspection.

I don't know if it should be *that* mechanical; there are a lot of places 
where I've seen " 'something %s' % repr(foo)" when I think it's much more 
clearly written as " 'something %r' % foo".  I don't know which is the 
officially preferred style, but if it's the latter (and I hope it is ;)) then 
it may not be good to mechanically change backticks to a repr call.

Jeremy

From eppstein at ics.uci.edu  Sun Nov 30 20:01:20 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Sun Nov 30 20:01:18 2003
Subject: [Python-Dev] Re: "groupby" iterator
References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<000101c3b63f$c7fc4720$e841fea9@oemcomputer>
	<20031129223220.GA90372@i18n.org>
	<200311301650.hAUGofH28925@c-24-5-183-134.client.comcast.net>
Message-ID: <eppstein-930F88.17012030112003@sea.gmane.org>

In article 
<200311301650.hAUGofH28925@c-24-5-183-134.client.comcast.net>,
 Guido van Rossum <guido@python.org> wrote:

> But I don't think there's a good use case
> for what he wants to do instead: save enough state so that the
> subiterators can be used in arbitrary order.  An application
> that saves the subiterators for later will end up saving a copy of
> everything, so it might as well be written so explicitly

I don't have a good explicit use case in mind, but my objective is to be 
able to use itertools-like functionals without having to pay much 
attention to which ones iterate through their arguments immediately and 
which ones defer the iteration until later.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


From guido at python.org  Sun Nov 30 20:08:14 2003
From: guido at python.org (Guido van Rossum)
Date: Sun Nov 30 20:08:30 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern()
In-Reply-To: Your message of "Sun, 30 Nov 2003 20:25:46 EST."
	<200311302025.46673.fincher.8@osu.edu> 
References: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net>
	<000201c3b71b$9facae40$e841fea9@oemcomputer>
	<20031130204459.GA3275@hishome.net> 
	<200311302025.46673.fincher.8@osu.edu> 
Message-ID: <200312010108.hB118EL29591@c-24-5-183-134.client.comcast.net>

> I don't know if it should be *that* mechanical; there are a lot of
> places where I've seen " 'something %s' % repr(foo)" when I think
> it's much more clearly written as " 'something %r' % foo".  I don't
> know which is the officially preferred style, but if it's the latter
> (and I hope it is ;)) then it may not be good to mechanically change
> backticks to a repr call.

If you're going to do that, I would beware of one thing.  If x is a
tuple, "foo %r" % x will not do the right thing: it will expect x to
be a 1-tuple and produce the repr of x[0]:

>>> a = (42,)
>>> print "foo %s" % repr(a)
foo (42,)
>>> print "foo %r" % a
foo 42
>>> a = (4, 2)
>>> print "foo %r" % a
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: not all arguments converted during string formatting
>>> 

This is only a problem when there's only one % format in the string;
if there are two or more, the argument is already a tuple and the
substitution of %s/repr(x) to %r/x works fine.  This also suggests a
solution: if there's only one argument, create an explicit tuple:

>>> print "foo %r" % (a,)
foo (4, 2)
>>> 

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Sun Nov 30 21:20:16 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Nov 30 21:18:24 2003
Subject: [Python-Dev] "groupby" iterator
In-Reply-To: <200312010018.hB10IbS29532@c-24-5-183-134.client.comcast.ne
 t>
References: <Your message of "Sun, 30 Nov 2003 18:48:25 EST."
	<5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com>
	<20031130073109.GA1560@hishome.net>
	<200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net>
	<002701c3b606$c61304a0$e841fea9@oemcomputer>
	<20031130073109.GA1560@hishome.net>
	<5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20031130210357.02f62d80@mail.telecommunity.com>

At 04:18 PM 11/30/03 -0800, Guido van Rossum wrote:
> > How about:
> >
> > extract(attr='grade')
> > extract(item=2)
> > extract(method='foo')   # returns the result of calling 'ob.foo()'
> >
> > And following the pattern of Zope's old "query" package:
> >
> > extract(extract(attr='foo'), attr='bar')   # extracts ob.foo.bar
> > extract(extract(item=10), method='spam')   # extracts ob[10].spam()
> >
> > i.e., the first (optional) positional argument to extract is a function
> > that's called on the outer extract's argument, and the return value is 
> then
> > used to perform the main extract operation on.
>
>I'm not sure what the advantage of this is.

The chaining part, or the idea at all?

For the idea in general, I was just proposing a more explicit form of the 
last API proposal.

For the chaining part, well, my use case is the same as the old Zope query 
library: being able to compose operators to craft OO queries from a high 
level description.  No reason that needs to go in the standard library, but 
as long as we were dreaming, I figured I might help implement it if it 
solved enough problems for me.  :)  (Without the chaining part, I don't 
really care if there's a standard library 'extract()' or not, since I'll 
still need to write a chaining one sooner or later.)


>   It seems more typing,
>more explanation, probably more code to implement (to check for
>contradicting keyword args).

Yes.  Really the whole extract thing isn't that useful, except to get extra 
speed over using 'lambda x: x.foo' or whatever, which is what I'd probably 
use in any code that wasn't composing functions or compiling an OO query 
language.  :)


From python at rcn.com  Sun Nov 30 23:35:56 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sun Nov 30 23:37:27 2003
Subject: [Python-Dev] Re: "groupby" iterator
In-Reply-To: <eppstein-930F88.17012030112003@sea.gmane.org>
Message-ID: <003301c3b7c4$b9f1a400$e841fea9@oemcomputer>

[Guido van Rossum]
> > But I don't think there's a good use case
> > for what he wants to do instead: save enough state so that the
> > subiterators can be used in arbitrary order.  An application
> > that saves the subiterators for later will end up saving a copy of
> > everything, so it might as well be written so explicitly

[David Eppstein]
> I don't have a good explicit use case in mind, but my objective is to
be
> able to use itertools-like functionals without having to pay much
> attention to which ones iterate through their arguments immediately
and
> which ones defer the iteration until later.

Okay, I've decided on this one.

Though David's idea is attractive in its generality, the use cases favor
the previous implementation.  IOW, there is a reasonable use case for
skipping or partially consuming the subiterators (e.g. "sort s | uniq"
and  "sort s | uniq -d").  For the delinguent subiterators, the user can
just convert them to a list if they are going to be needed later:

groups = []
for k, g in groupby(seq, keyfunc):
    groups.append(list(g))
    <do something with k>

With respect to the principle of least surprise, it is the lesser evil
between having a delinquent subiterator turn-up empty or having an
itertool unexpectedly fall into a memory intensive mode.

The first can be flagged so it won't pass silently.  The second is more
problematic because it is silent and because it is inconsistent with the
memory friendly nature of itertools.

Another minor argument against David's version is that the pure python
version (which will be included in the docs) is longer and harder to
follow.


Raymond Hettinger


P.S.  I'm leaning toward Alex's suggested argument order.  Having a
default identity function is too attractive to pass up.  So the choice
is between a style like map(None, s) or something closer to
list.sorted(s, key=).   Though the latter is not consistent with other
itertools, it wins in the beauty department and its similarity with the
key= is a accurate, helpful analogy.


From python at rcn.com  Sun Nov 30 23:42:55 2003
From: python at rcn.com (Raymond Hettinger)
Date: Sun Nov 30 23:43:39 2003
Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() 
In-Reply-To: <200311300128.hAU1S0cE031343@maxim.off.ekorp.com>
Message-ID: <003401c3b7c5$96b39e20$e841fea9@oemcomputer>

> > Yes, backticks will be gone in 3.0.  But I expect there's no hope of
> > getting rid of them earlier -- they've been used too much.  I
suspect
> 
> Then let's kill all use of backticks in the standard library. There's
> a lot of them.

Advisory from a micro-performance hawk:  Backticks are faster than
repr()

>>> from timeit import Timer
>>> min(Timer('`x`', 'x=1').repeat(3))
1.4857213496706265
>>> min(Timer('repr(x)', 'x=1').repeat(3))
1.7748914665012876


Raymond Hettinger


From eppstein at ics.uci.edu  Sun Nov 30 23:42:18 2003
From: eppstein at ics.uci.edu (David Eppstein)
Date: Mon Dec  1 11:53:22 2003
Subject: [Python-Dev] Re: "groupby" iterator
In-Reply-To: <003301c3b7c4$b9f1a400$e841fea9@oemcomputer>
References: <003301c3b7c4$b9f1a400$e841fea9@oemcomputer>
Message-ID: <30187757.1070224938@[192.168.1.100]>

On 11/30/03 11:35 PM -0500 Raymond Hettinger <python@rcn.com> wrote:
> Okay, I've decided on this one.
>
> Though David's idea is attractive in its generality, the use cases favor
> the previous implementation.  IOW, there is a reasonable use case for
> skipping or partially consuming the subiterators (e.g. "sort s | uniq"
> and  "sort s | uniq -d").  For the delinguent subiterators, the user can
> just convert them to a list if they are going to be needed later:

My implementation will skip or partially consume the subiterators, with 
only a very temporary additional use of memory, if you don't keep a 
reference to them.
But I can see your arguments about visible vs silent failure modes and code 
complexity.
-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science